Prerequisites>

Before you begin, ensure you have the following:

System Requirements>
  • Operating System: Linux (Ubuntu 20.04+, CentOS 8+) or macOS 10.15+
  • Memory: Minimum 8GB RAM (16GB recommended)
  • Storage: At least 50GB free disk space
  • Network: Internet connectivity for package downloads

Required Software>
Component Version Purpose
Docker 20.10+ Container runtime
Docker Compose 2.0+ Multi-container orchestration
Java 11 or 17 Runtime environment
Python 3.8+ Scripting and notebooks

Installation>

DOCC can be installed using Docker Compose for a quick start, or deployed to Kubernetes for production environments.

Quick Start with Docker Compose>

For development and testing, use our Docker Compose setup:

# Download the Docker Compose file curl -O https://releases.giboondata.com/docker-compose.yml # Start DOCC services docker-compose up -d # Verify installation docker-compose ps

Installation Complete>

Once all services are running, DOCC will be available at http://localhost:8080

Production Installation>

For production deployments, we recommend using Kubernetes. See our detailed installation guide for complete instructions.

First Login>

After installation, you'll need to complete the initial setup:

1. Access the Platform>

Navigate to http://localhost:8080 in your web browser.

Default Credentials:

  • Username: admin
  • Password: admin123

2. Initial Configuration>

Follow the setup wizard to configure:

  • Administrator account
  • Organization settings
  • Basic security policies
  • Email notifications

3. License Activation>

Enter your license key or start with the trial version:

  • Trial: 30-day full feature access
  • Enterprise: Contact sales for license
  • Community: Open source features only

Security Notice>

Change the default administrator password immediately after first login. Go to Settings → User Management → Change Password.

Quick Platform Tour>

Let's explore the main areas of the DOCC platform:

Dashboard>

The main dashboard provides an overview of your data operations. You'll see system health, active jobs, data quality metrics, and recent activity.

  • System Status: Overall platform health and performance
  • Active Jobs: Currently running data pipelines and tasks
  • Quality Metrics: Data quality scores and trends
  • Recent Activity: Latest data operations and user actions

Data Catalog>

The data catalog helps you discover, understand, and manage your data assets. Browse datasets, view metadata, and explore data lineage.

  • Dataset Browser: Explore all available datasets
  • Search & Filters: Find data by name, tags, or properties
  • Metadata Viewer: Detailed information about each dataset
  • Lineage Graph: Visual representation of data flow

Pipeline Designer>

Create data processing workflows using the visual pipeline designer. Drag and drop components to build complex data transformations.

  • Component Library: Pre-built processing components
  • Visual Editor: Drag-and-drop pipeline creation
  • Real-time Preview: See data as it flows through the pipeline
  • Execution Monitor: Track pipeline runs and performance

Create Your First Project>

Let's create a simple data pipeline to get you started:

Step 1: Create a New Project>
1. Click "New Project" in the dashboard 2. Enter project name: "My First Pipeline" 3. Select template: "Data Ingestion & Quality" 4. Click "Create Project"

Step 2: Add Data Source>

Connect to your first data source:

  • Go to Data Sources → Add New
  • Choose connector type (CSV file, database, API)
  • Configure connection parameters
  • Test the connection
  • Save the data source

Step 3: Build Your Pipeline>

1. Data Ingestion>

Drag the "File Reader" component and configure it to read your data source.

2. Data Quality>

Add quality checks like "Null Check" and "Schema Validation" components.

3. Transformation>

Apply basic transformations like "Filter" or "Column Rename" if needed.

4. Output>

Connect a "Data Writer" component to store the processed data.

Step 4: Run Your Pipeline>

Execute your first pipeline:

1. Click "Validate Pipeline" to check for errors 2. Click "Run Pipeline" to start execution 3. Monitor progress in the execution panel 4. Check results in the output destination

Congratulations!>

You've successfully created and run your first data pipeline. Check the dashboard to see your pipeline's performance metrics and quality scores.

Next Steps>

Now that you're familiar with the basics, explore these advanced features:

Advanced Analytics>

Learn to use SQL queries, notebooks, and machine learning capabilities.

Security & Governance>

Set up user access controls, data governance policies, and compliance.

Integrations>

Connect DOCC with your existing tools and data infrastructure.