Overview

DOCC Catalog Volume provides a governance layer for managing cloud storage access in your data platform. This system separates storage credentials from storage locations, enabling centralized governance, security, and access control across your data infrastructure.

Architecture Overview

The DOCC Catalog system consists of two main components:

Storage Credentials

Secure, reusable authentication configurations for cloud providers

External Locations

Governed pointers to specific storage paths that reference Storage Credentials

Key Benefits

Centralized Management

Single point of control for all storage credentials and access policies

Enhanced Security

Fine-grained access control with workspace-level governance

Audit & Compliance

Comprehensive audit logging and compliance-ready features

>Credential Rotation

Zero-downtime credential rotation without affecting data access

>S3 Setup Guide

Step 1: Prepare AWS S3 Infrastructure
Create S3 Bucket
# Set variables
export BUCKET_NAME="your-data-bucket"
export REGION="us-east-1"
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Create bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region $REGION

# Configure bucket security
aws s3api put-public-access-block --bucket $BUCKET_NAME \
  --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
IAM Policy (Minimal Permissions)
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:GetBucketVersioning"
      ],
      "Resource": "arn:aws:s3:::your-data-bucket"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::your-data-bucket/*"
    }
  ]
}

Step 2: Choose Authentication Method
>IAM User (Basic)
# Create IAM user
aws iam create-user --user-name dataplatform-s3-user

# Attach policy
aws iam attach-user-policy --user-name dataplatform-s3-user \
  --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/DataPlatformS3Policy

# Create access keys
aws iam create-access-key --user-name dataplatform-s3-user
>Assume Role (Recommended)
# Create trust policy
cat > trust-policy.json <

Step 3: Create Storage Credential
  1. Navigate to Integration → Data Store → DOCC Catalog Volume
  2. Click Storage Credentials tab
  3. Click Create Storage Credential
  4. Fill in the configuration:
For IAM User:
{
  "name": "s3-production-credentials",
  "credentialType": "STATIC_CREDENTIALS",
  "providerType": "AWS_S3",
  "credentialConfig": {
    "accessKeyId": "AKIA...",
    "secretAccessKey": "...",
    "region": "us-east-1"
  }
}
For Assume Role:
{
  "name": "s3-assume-role",
  "credentialType": "ASSUME_ROLE",
  "providerType": "AWS_S3",
  "credentialConfig": {
    "roleArn": "arn:aws:iam::123456789012:role/DataPlatformS3AssumeRole",
    "externalId": "platform-external-id-1234567890",
    "region": "us-east-1",
    "sessionDuration": 3600
  }
}

Step 4: Create External Location
  1. Click External Locations tab
  2. Click Create External Location
  3. Configure the location:
{
  "name": "s3-data-lake-bronze",
  "description": "Bronze layer of data lake for raw data ingestion",
  "url": "s3://your-data-bucket/bronze/",
  "storageCredentialId": 1,
  "governanceConfig": {
    "readOnly": false,
    "auditLevel": "FULL",
    "retentionDays": 2555,
    "complianceTags": ["PII", "GDPR"]
  },
  "workspaceBindings": {
    "allowedWorkspaces": ["data-engineering", "analytics"],
    "workspacePermissions": {
      "data-engineering": ["READ", "WRITE", "DELETE"],
      "analytics": ["READ"]
    }
  }
}

️ Security Best Practices

🔐 Credential Management
  • Use Assume Roles over static access keys
  • Always use external IDs for assume role configurations
  • Grant only minimum required permissions
  • Rotate credentials regularly (quarterly recommended)
  • Enable CloudTrail logging for all S3 access

>Network Security
  • Use VPC endpoints for S3 access when possible
  • Configure network policies to restrict egress traffic
  • Ensure all data transfer uses TLS 1.2 or higher
  • Consider IP-based access restrictions in IAM policies

Data Governance
  • Enable server-side encryption on all S3 buckets
  • Enable S3 versioning for data protection
  • Enable S3 access logging for audit trails
  • Configure appropriate lifecycle policies

>Monitoring & Alerting
  • Monitor S3 request metrics and error rates
  • Set up billing alerts for unexpected usage
  • Monitor for unusual access patterns
  • Configure automated health checks

>Common Troubleshooting
>Connection Failures

Symptom: Storage credential test fails with "Access Denied"

Solutions:

  • Verify IAM policy has correct permissions
  • Check bucket policy doesn't deny access
  • Ensure external ID matches exactly
  • Verify role trust policy allows assume role
>Network Issues

Symptom: Timeout errors when accessing S3

Solutions:

  • Check platform network policies
  • Verify DNS resolution for s3.amazonaws.com
  • Ensure outbound HTTPS (443) traffic is allowed
  • Consider using VPC endpoints
>Performance Issues

Symptom: Slow data transfer or high latency

Solutions:

  • Choose S3 region closest to platform deployment
  • Optimize multipart upload settings
  • Use appropriate S3 storage class
  • Monitor and adjust concurrent connection limits
>API Reference

Storage Credentials API
# Create storage credential
curl -X POST /api/v1/catalog/storage-credentials \
  -H "Content-Type: application/json" \
  -d '{
    "name": "s3-credentials",
    "credentialType": "ASSUME_ROLE",
    "providerType": "AWS_S3",
    "credentialConfig": {...}
  }'

# Test storage credential
curl -X POST /api/v1/catalog/storage-credentials/{id}/test

External Locations API
# Create external location
curl -X POST /api/v1/catalog/external-locations \
  -H "Content-Type: application/json" \
  -d '{
    "name": "s3-data-location",
    "url": "s3://bucket/path/",
    "storageCredentialId": 1,
    "locationConfig": {...}
  }'

# Health check external location
curl -X GET /api/v1/catalog/external-locations/{id}/health

>Advanced Configuration

Multi-Region Setup

For multi-region deployments, create separate storage credentials for each region:

{
  "name": "s3-us-west-2-credentials",
  "credentialType": "ASSUME_ROLE",
  "providerType": "AWS_S3",
  "credentialConfig": {
    "roleArn": "arn:aws:iam::123456789012:role/DataPlatformS3Role",
    "externalId": "unique-external-id",
    "region": "us-west-2"
  }
}

Cross-Account Access Patterns

For accessing buckets in different AWS accounts:

  1. Create role in target account with appropriate permissions
  2. Configure trust policy to allow assume role from your platform account
  3. Use ASSUME_ROLE_CROSS_ACCOUNT credential type
  4. Specify target account's role ARN

Integration with Data Catalogs

DOCC Catalog can integrate with external data catalogs:

{
  "catalogIntegration": {
    "type": "AWS_GLUE",
    "configuration": {
      "catalogId": "123456789012",
      "databaseName": "data_lake",
      "syncMetadata": true
    }
  }
}

>Migration from Legacy Mounts

If you have existing DOCCFS mounts, you can migrate them to DOCC Catalog:

1

Audit Existing Mounts

Review current mount configurations

2

Create Storage Credentials

Extract credentials into reusable storage credentials

3

Create External Locations

Convert mount paths to external locations

4

Update Applications

Modify applications to use external location references

5

Decommission Legacy Mounts

Remove old mounts after validation

Prerequisites

Before setting up DOCC Catalog volumes, ensure you have:

  • Administrative access to your data platform
  • Cloud provider account with appropriate permissions
  • Understanding of your organization's data governance policies
  • Network connectivity from your platform to your cloud storage

Support and Resources
>Documentation

Refer to your platform's technical documentation

>API Reference

Complete API documentation available in your platform

>Best Practices

Follow cloud provider security best practices

>Support

Contact your system administrator for platform-specific guidance

Note: This guide covers S3 integration. Similar patterns apply to Azure Blob Storage and Google Cloud Storage with provider-specific configurations.

Next Steps