Getting Started with DOCC
This guide will help you get up and running with the Data Operation Control Center (DOCC) platform. Follow these steps to set up your environment and create your first data pipeline.
Prerequisites
Before you begin, ensure you have the following:
System Requirements
- Operating System: Linux (Ubuntu 20.04+, CentOS 8+) or macOS 10.15+
- Memory: Minimum 8GB RAM (16GB recommended)
- Storage: At least 50GB free disk space
- Network: Internet connectivity for package downloads
Required Software
Component | Version | Purpose |
---|---|---|
Docker | 20.10+ | Container runtime |
Docker Compose | 2.0+ | Multi-container orchestration |
Java | 11 or 17 | Runtime environment |
Python | 3.8+ | Scripting and notebooks |
Installation
DOCC can be installed using Docker Compose for a quick start, or deployed to Kubernetes for production environments.
Quick Start with Docker Compose
For development and testing, use our Docker Compose setup:
# Download the Docker Compose file
curl -O https://releases.giboondata.com/docker-compose.yml
# Start DOCC services
docker-compose up -d
# Verify installation
docker-compose ps
Installation Complete
Once all services are running, DOCC will be available at http://localhost:8080
Production Installation
For production deployments, we recommend using Kubernetes. See our detailed installation guide for complete instructions.
First Login
After installation, you'll need to complete the initial setup:
1. Access the Platform
Navigate to http://localhost:8080
in your web browser.
Default Credentials:
- Username:
admin
- Password:
admin123
2. Initial Configuration
Follow the setup wizard to configure:
- Administrator account
- Organization settings
- Basic security policies
- Email notifications
3. License Activation
Enter your license key or start with the trial version:
- Trial: 30-day full feature access
- Enterprise: Contact sales for license
- Community: Open source features only
Security Notice
Change the default administrator password immediately after first login. Go to Settings → User Management → Change Password.
Quick Platform Tour
Let's explore the main areas of the DOCC platform:
Dashboard
The main dashboard provides an overview of your data operations. You'll see system health, active jobs, data quality metrics, and recent activity.
- System Status: Overall platform health and performance
- Active Jobs: Currently running data pipelines and tasks
- Quality Metrics: Data quality scores and trends
- Recent Activity: Latest data operations and user actions
Data Catalog
The data catalog helps you discover, understand, and manage your data assets. Browse datasets, view metadata, and explore data lineage.
- Dataset Browser: Explore all available datasets
- Search & Filters: Find data by name, tags, or properties
- Metadata Viewer: Detailed information about each dataset
- Lineage Graph: Visual representation of data flow
Pipeline Designer
Create data processing workflows using the visual pipeline designer. Drag and drop components to build complex data transformations.
- Component Library: Pre-built processing components
- Visual Editor: Drag-and-drop pipeline creation
- Real-time Preview: See data as it flows through the pipeline
- Execution Monitor: Track pipeline runs and performance
Create Your First Project
Let's create a simple data pipeline to get you started:
Step 1: Create a New Project
1. Click "New Project" in the dashboard
2. Enter project name: "My First Pipeline"
3. Select template: "Data Ingestion & Quality"
4. Click "Create Project"
Step 2: Add Data Source
Connect to your first data source:
- Go to Data Sources → Add New
- Choose connector type (CSV file, database, API)
- Configure connection parameters
- Test the connection
- Save the data source
Step 3: Build Your Pipeline
1. Data Ingestion
Drag the "File Reader" component and configure it to read your data source.
2. Data Quality
Add quality checks like "Null Check" and "Schema Validation" components.
3. Transformation
Apply basic transformations like "Filter" or "Column Rename" if needed.
4. Output
Connect a "Data Writer" component to store the processed data.
Step 4: Run Your Pipeline
Execute your first pipeline:
1. Click "Validate Pipeline" to check for errors
2. Click "Run Pipeline" to start execution
3. Monitor progress in the execution panel
4. Check results in the output destination
Congratulations!
You've successfully created and run your first data pipeline. Check the dashboard to see your pipeline's performance metrics and quality scores.
Next Steps
Now that you're familiar with the basics, explore these advanced features:
Advanced Analytics
Learn to use SQL queries, notebooks, and machine learning capabilities.
Security & Governance
Set up user access controls, data governance policies, and compliance.
Integrations
Connect DOCC with your existing tools and data infrastructure.