Getting Started with DOCC
>This guide will help you get up and running with the Data Operation Control Center (DOCC) platform. Follow these steps to set up your environment and create your first data pipeline.
Prerequisites
>Before you begin, ensure you have the following:
System Requirements
>- Operating System: Linux (Ubuntu 20.04+, CentOS 8+) or macOS 10.15+
- Memory: Minimum 8GB RAM (16GB recommended)
- Storage: At least 50GB free disk space
- Network: Internet connectivity for package downloads
Required Software
>Component | Version | Purpose |
---|---|---|
Docker | 20.10+ | Container runtime |
Docker Compose | 2.0+ | Multi-container orchestration |
Java | 11 or 17 | Runtime environment |
Python | 3.8+ | Scripting and notebooks |
Installation
>DOCC can be installed using Docker Compose for a quick start, or deployed to Kubernetes for production environments.
Quick Start with Docker Compose
>For development and testing, use our Docker Compose setup:
# Download the Docker Compose file
curl -O https://releases.giboondata.com/docker-compose.yml
# Start DOCC services
docker-compose up -d
# Verify installation
docker-compose ps
Installation Complete
>Once all services are running, DOCC will be available at http://localhost:8080
Production Installation
>For production deployments, we recommend using Kubernetes. See our detailed installation guide for complete instructions.
First Login
>After installation, you'll need to complete the initial setup:
1. Access the Platform
>Navigate to http://localhost:8080
in your web browser.
Default Credentials:
- Username:
admin
- Password:
admin123
2. Initial Configuration
>Follow the setup wizard to configure:
- Administrator account
- Organization settings
- Basic security policies
- Email notifications
3. License Activation
>Enter your license key or start with the trial version:
- Trial: 30-day full feature access
- Enterprise: Contact sales for license
- Community: Open source features only
Security Notice
>Change the default administrator password immediately after first login. Go to Settings → User Management → Change Password.
Quick Platform Tour
>Let's explore the main areas of the DOCC platform:
Dashboard
>The main dashboard provides an overview of your data operations. You'll see system health, active jobs, data quality metrics, and recent activity.
- System Status: Overall platform health and performance
- Active Jobs: Currently running data pipelines and tasks
- Quality Metrics: Data quality scores and trends
- Recent Activity: Latest data operations and user actions
Data Catalog
>The data catalog helps you discover, understand, and manage your data assets. Browse datasets, view metadata, and explore data lineage.
- Dataset Browser: Explore all available datasets
- Search & Filters: Find data by name, tags, or properties
- Metadata Viewer: Detailed information about each dataset
- Lineage Graph: Visual representation of data flow
Pipeline Designer
>Create data processing workflows using the visual pipeline designer. Drag and drop components to build complex data transformations.
- Component Library: Pre-built processing components
- Visual Editor: Drag-and-drop pipeline creation
- Real-time Preview: See data as it flows through the pipeline
- Execution Monitor: Track pipeline runs and performance
Create Your First Project
>Let's create a simple data pipeline to get you started:
Step 1: Create a New Project
>1. Click "New Project" in the dashboard
2. Enter project name: "My First Pipeline"
3. Select template: "Data Ingestion & Quality"
4. Click "Create Project"
Step 2: Add Data Source
>Connect to your first data source:
- Go to Data Sources → Add New
- Choose connector type (CSV file, database, API)
- Configure connection parameters
- Test the connection
- Save the data source
Step 3: Build Your Pipeline
>1. Data Ingestion
>Drag the "File Reader" component and configure it to read your data source.
2. Data Quality
>Add quality checks like "Null Check" and "Schema Validation" components.
3. Transformation
>Apply basic transformations like "Filter" or "Column Rename" if needed.
4. Output
>Connect a "Data Writer" component to store the processed data.
Step 4: Run Your Pipeline
>Execute your first pipeline:
1. Click "Validate Pipeline" to check for errors
2. Click "Run Pipeline" to start execution
3. Monitor progress in the execution panel
4. Check results in the output destination
Congratulations!
>You've successfully created and run your first data pipeline. Check the dashboard to see your pipeline's performance metrics and quality scores.
Next Steps
>Now that you're familiar with the basics, explore these advanced features:
Advanced Analytics
>Learn to use SQL queries, notebooks, and machine learning capabilities.
Security & Governance
>Set up user access controls, data governance policies, and compliance.
Integrations
>Connect DOCC with your existing tools and data infrastructure.