DOCC Catalog Volume User Guide
DOCC Catalog Volume provides a governance layer for managing cloud storage access in your data platform. This system separates storage credentials from storage locations, enabling centralized governance, security, and access control across your data infrastructure.
Overview
DOCC Catalog Volume provides a governance layer for managing cloud storage access in your data platform. This system separates storage credentials from storage locations, enabling centralized governance, security, and access control across your data infrastructure.
Architecture Overview
The DOCC Catalog system consists of two main components:
Storage Credentials
Secure, reusable authentication configurations for cloud providers
External Locations
Governed pointers to specific storage paths that reference Storage Credentials
Key Benefits
Centralized Management
Single point of control for all storage credentials and access policies
Enhanced Security
Fine-grained access control with workspace-level governance
Audit & Compliance
Comprehensive audit logging and compliance-ready features
Zero-downtime credential rotation without affecting data access
Step 1: Prepare AWS S3 Infrastructure
Create S3 Bucket
# Set variables
export BUCKET_NAME="your-data-bucket"
export REGION="us-east-1"
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Create bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region $REGION
# Configure bucket security
aws s3api put-public-access-block --bucket $BUCKET_NAME \
--public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
IAM Policy (Minimal Permissions)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetBucketVersioning"
],
"Resource": "arn:aws:s3:::your-data-bucket"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::your-data-bucket/*"
}
]
}
Step 2: Choose Authentication Method
# Create IAM user
aws iam create-user --user-name dataplatform-s3-user
# Attach policy
aws iam attach-user-policy --user-name dataplatform-s3-user \
--policy-arn arn:aws:iam::$ACCOUNT_ID:policy/DataPlatformS3Policy
# Create access keys
aws iam create-access-key --user-name dataplatform-s3-user
# Create trust policy
cat > trust-policy.json <
Step 3: Create Storage Credential
- Navigate to Integration → Data Store → DOCC Catalog Volume
- Click Storage Credentials tab
- Click Create Storage Credential
- Fill in the configuration:
For IAM User:
{
"name": "s3-production-credentials",
"credentialType": "STATIC_CREDENTIALS",
"providerType": "AWS_S3",
"credentialConfig": {
"accessKeyId": "AKIA...",
"secretAccessKey": "...",
"region": "us-east-1"
}
}
For Assume Role:
{
"name": "s3-assume-role",
"credentialType": "ASSUME_ROLE",
"providerType": "AWS_S3",
"credentialConfig": {
"roleArn": "arn:aws:iam::123456789012:role/DataPlatformS3AssumeRole",
"externalId": "platform-external-id-1234567890",
"region": "us-east-1",
"sessionDuration": 3600
}
}
Step 4: Create External Location
- Click External Locations tab
- Click Create External Location
- Configure the location:
{
"name": "s3-data-lake-bronze",
"description": "Bronze layer of data lake for raw data ingestion",
"url": "s3://your-data-bucket/bronze/",
"storageCredentialId": 1,
"governanceConfig": {
"readOnly": false,
"auditLevel": "FULL",
"retentionDays": 2555,
"complianceTags": ["PII", "GDPR"]
},
"workspaceBindings": {
"allowedWorkspaces": ["data-engineering", "analytics"],
"workspacePermissions": {
"data-engineering": ["READ", "WRITE", "DELETE"],
"analytics": ["READ"]
}
}
}
️ Security Best Practices
🔐 Credential Management
- Use Assume Roles over static access keys
- Always use external IDs for assume role configurations
- Grant only minimum required permissions
- Rotate credentials regularly (quarterly recommended)
- Enable CloudTrail logging for all S3 access
- Use VPC endpoints for S3 access when possible
- Configure network policies to restrict egress traffic
- Ensure all data transfer uses TLS 1.2 or higher
- Consider IP-based access restrictions in IAM policies
Data Governance
- Enable server-side encryption on all S3 buckets
- Enable S3 versioning for data protection
- Enable S3 access logging for audit trails
- Configure appropriate lifecycle policies
- Monitor S3 request metrics and error rates
- Set up billing alerts for unexpected usage
- Monitor for unusual access patterns
- Configure automated health checks
Symptom: Storage credential test fails with "Access Denied"
Solutions:
- Verify IAM policy has correct permissions
- Check bucket policy doesn't deny access
- Ensure external ID matches exactly
- Verify role trust policy allows assume role
Symptom: Timeout errors when accessing S3
Solutions:
- Check platform network policies
- Verify DNS resolution for s3.amazonaws.com
- Ensure outbound HTTPS (443) traffic is allowed
- Consider using VPC endpoints
Symptom: Slow data transfer or high latency
Solutions:
- Choose S3 region closest to platform deployment
- Optimize multipart upload settings
- Use appropriate S3 storage class
- Monitor and adjust concurrent connection limits
Storage Credentials API
# Create storage credential
curl -X POST /api/v1/catalog/storage-credentials \
-H "Content-Type: application/json" \
-d '{
"name": "s3-credentials",
"credentialType": "ASSUME_ROLE",
"providerType": "AWS_S3",
"credentialConfig": {...}
}'
# Test storage credential
curl -X POST /api/v1/catalog/storage-credentials/{id}/test
External Locations API
# Create external location
curl -X POST /api/v1/catalog/external-locations \
-H "Content-Type: application/json" \
-d '{
"name": "s3-data-location",
"url": "s3://bucket/path/",
"storageCredentialId": 1,
"locationConfig": {...}
}'
# Health check external location
curl -X GET /api/v1/catalog/external-locations/{id}/health
Multi-Region Setup
For multi-region deployments, create separate storage credentials for each region:
{
"name": "s3-us-west-2-credentials",
"credentialType": "ASSUME_ROLE",
"providerType": "AWS_S3",
"credentialConfig": {
"roleArn": "arn:aws:iam::123456789012:role/DataPlatformS3Role",
"externalId": "unique-external-id",
"region": "us-west-2"
}
}
Cross-Account Access Patterns
For accessing buckets in different AWS accounts:
- Create role in target account with appropriate permissions
- Configure trust policy to allow assume role from your platform account
- Use
ASSUME_ROLE_CROSS_ACCOUNT
credential type - Specify target account's role ARN
Integration with Data Catalogs
DOCC Catalog can integrate with external data catalogs:
{
"catalogIntegration": {
"type": "AWS_GLUE",
"configuration": {
"catalogId": "123456789012",
"databaseName": "data_lake",
"syncMetadata": true
}
}
}
If you have existing DOCCFS mounts, you can migrate them to DOCC Catalog:
Audit Existing Mounts
Review current mount configurations
Create Storage Credentials
Extract credentials into reusable storage credentials
Create External Locations
Convert mount paths to external locations
Update Applications
Modify applications to use external location references
Decommission Legacy Mounts
Remove old mounts after validation
Prerequisites
Before setting up DOCC Catalog volumes, ensure you have:
- Administrative access to your data platform
- Cloud provider account with appropriate permissions
- Understanding of your organization's data governance policies
- Network connectivity from your platform to your cloud storage
Support and Resources
Refer to your platform's technical documentation
Complete API documentation available in your platform
Follow cloud provider security best practices
Contact your system administrator for platform-specific guidance
Note: This guide covers S3 integration. Similar patterns apply to Azure Blob Storage and Google Cloud Storage with provider-specific configurations.