API Overview

The DOCC REST API provides programmatic access to all platform functionality. Use these endpoints to integrate DOCC with your existing tools and workflows.

Base URL

All API requests should be made to:

https://your-docc-instance.com/api/v1

API Characteristics

  • RESTful Design: Standard HTTP methods (GET, POST, PUT, DELETE)
  • JSON Format: All requests and responses use JSON
  • Stateless: Each request must include authentication
  • Rate Limited: 1000 requests per minute per API key
  • Versioned: API version specified in URL path

Authentication

DOCC API supports multiple authentication methods. Choose the one that best fits your use case.

API Keys

The simplest authentication method for server-to-server integrations:

curl -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ https://your-docc-instance.com/api/v1/datasets

API Key Security

Keep your API keys secure. Never expose them in client-side code or public repositories. Rotate keys regularly and use environment variables for storage.

OAuth 2.0

For applications that need to act on behalf of users:

Grant Type Use Case Token Endpoint
Authorization Code Web applications /oauth/token
Client Credentials Machine-to-machine /oauth/token
Refresh Token Token renewal /oauth/refresh

JWT Tokens

For enterprise integrations with existing identity providers:

{ "alg": "RS256", "typ": "JWT" } { "iss": "your-identity-provider", "sub": "user@company.com", "aud": "docc-api", "exp": 1234567890, "iat": 1234567890, "scope": "read:datasets write:pipelines" }

Data Management APIs

Manage datasets, schemas, and metadata through these endpoints.

Datasets

Create, retrieve, update, and delete dataset configurations.

Method Endpoint Description
GET /datasets List all datasets
POST /datasets Create new dataset
GET /datasets/{id} Get dataset details
PUT /datasets/{id} Update dataset
DELETE /datasets/{id} Delete dataset

Example: Create Dataset

POST /api/v1/datasets Content-Type: application/json Authorization: Bearer YOUR_API_KEY { "name": "customer_data", "description": "Customer information dataset", "source": { "type": "database", "connection": "postgresql://user:pass@host:5432/db", "table": "customers" }, "schema": { "fields": [ {"name": "id", "type": "integer", "nullable": false}, {"name": "name", "type": "string", "nullable": false}, {"name": "email", "type": "string", "nullable": true} ] }, "tags": ["customer", "pii"] }

Response

{ "id": "ds_123456789", "name": "customer_data", "description": "Customer information dataset", "status": "active", "created_at": "2024-01-15T10:30:00Z", "updated_at": "2024-01-15T10:30:00Z", "source": { "type": "database", "connection": "postgresql://user:***@host:5432/db", "table": "customers" }, "schema": { "fields": [ {"name": "id", "type": "integer", "nullable": false}, {"name": "name", "type": "string", "nullable": false}, {"name": "email", "type": "string", "nullable": true} ] }, "tags": ["customer", "pii"], "quality_score": null, "last_profiled": null }

Schema Management

Manage dataset schemas and track schema evolution:

Method Endpoint Description
GET /datasets/{id}/schema Get current schema
PUT /datasets/{id}/schema Update schema
GET /datasets/{id}/schema/history Schema evolution history
POST /datasets/{id}/schema/validate Validate schema changes

Pipeline Management APIs

Create and manage data processing pipelines programmatically.

Pipeline Operations

Method Endpoint Description
GET /pipelines List all pipelines
POST /pipelines Create new pipeline
GET /pipelines/{id} Get pipeline details
PUT /pipelines/{id} Update pipeline
POST /pipelines/{id}/run Execute pipeline
POST /pipelines/{id}/schedule Schedule pipeline

Example: Create Pipeline

POST /api/v1/pipelines Content-Type: application/json { "name": "customer_data_processing", "description": "Process and clean customer data", "steps": [ { "id": "input", "type": "data_source", "config": { "dataset_id": "ds_123456789" } }, { "id": "clean", "type": "data_cleaner", "config": { "remove_duplicates": true, "handle_nulls": "drop" } }, { "id": "output", "type": "data_sink", "config": { "destination": "warehouse.cleaned_customers" } } ], "schedule": { "cron": "0 2 * * *", "timezone": "UTC" } }

Job Execution & Monitoring

Track pipeline executions and monitor job status:

Method Endpoint Description
GET /jobs List job executions
GET /jobs/{id} Get job details
POST /jobs/{id}/cancel Cancel running job
GET /jobs/{id}/logs Get job execution logs

Quality Management APIs

Monitor and manage data quality through automated rules and reporting.

Quality Rules

Method Endpoint Description
GET /quality/rules List quality rules
POST /quality/rules Create quality rule
GET /quality/rules/{id} Get rule details
PUT /quality/rules/{id} Update rule
POST /quality/rules/{id}/run Execute rule check

Example: Create Quality Rule

POST /api/v1/quality/rules Content-Type: application/json { "name": "email_format_check", "description": "Validate email format in customer data", "dataset_id": "ds_123456789", "rule_type": "regex_match", "config": { "column": "email", "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$", "threshold": 0.95 }, "severity": "warning", "enabled": true }

Error Handling

The DOCC API uses standard HTTP status codes and provides detailed error information:

Status Code Meaning Description
200 OK Request successful
201 Created Resource created successfully
400 Bad Request Invalid request parameters
401 Unauthorized Authentication required
403 Forbidden Insufficient permissions
404 Not Found Resource not found
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Server error occurred

Error Response Format

{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid dataset configuration", "details": { "field": "source.connection", "reason": "Invalid connection string format" }, "request_id": "req_123456789" } }

Rate Limiting

API requests are rate limited to ensure fair usage and system stability:

Rate Limits

  • Standard API: 1,000 requests per minute
  • Bulk Operations: 100 requests per minute
  • Authentication: 10 requests per minute

Rate limit headers are included in all responses:

X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 999 X-RateLimit-Reset: 1234567890

SDKs & Libraries

Use our official SDKs for easier integration:

Python SDK

Full-featured Python library with async support.

pip install docc-python-sdk

JavaScript SDK

TypeScript-ready SDK for Node.js and browsers.

npm install @giboondata/docc-sdk

Java SDK

Enterprise-ready Java library with Spring Boot integration.

implementation 'com.giboondata:docc-java-sdk:1.0.0'