Prerequisites

Before you begin, ensure you have the following:

System Requirements

  • Operating System: Linux (Ubuntu 20.04+, CentOS 8+) or macOS 10.15+
  • Memory: Minimum 8GB RAM (16GB recommended)
  • Storage: At least 50GB free disk space
  • Network: Internet connectivity for package downloads

Required Software

Component Version Purpose
Docker 20.10+ Container runtime
Docker Compose 2.0+ Multi-container orchestration
Java 11 or 17 Runtime environment
Python 3.8+ Scripting and notebooks

Installation

DOCC can be installed using Docker Compose for a quick start, or deployed to Kubernetes for production environments.

Quick Start with Docker Compose

For development and testing, use our Docker Compose setup:

# Download the Docker Compose file curl -O https://releases.giboondata.com/docker-compose.yml # Start DOCC services docker-compose up -d # Verify installation docker-compose ps

Installation Complete

Once all services are running, DOCC will be available at http://localhost:8080

Production Installation

For production deployments, we recommend using Kubernetes. See our detailed installation guide for complete instructions.

First Login

After installation, you'll need to complete the initial setup:

1. Access the Platform

Navigate to http://localhost:8080 in your web browser.

Default Credentials:

  • Username: admin
  • Password: admin123

2. Initial Configuration

Follow the setup wizard to configure:

  • Administrator account
  • Organization settings
  • Basic security policies
  • Email notifications

3. License Activation

Enter your license key or start with the trial version:

  • Trial: 30-day full feature access
  • Enterprise: Contact sales for license
  • Community: Open source features only

Security Notice

Change the default administrator password immediately after first login. Go to Settings → User Management → Change Password.

Quick Platform Tour

Let's explore the main areas of the DOCC platform:

Dashboard

The main dashboard provides an overview of your data operations. You'll see system health, active jobs, data quality metrics, and recent activity.

  • System Status: Overall platform health and performance
  • Active Jobs: Currently running data pipelines and tasks
  • Quality Metrics: Data quality scores and trends
  • Recent Activity: Latest data operations and user actions

Data Catalog

The data catalog helps you discover, understand, and manage your data assets. Browse datasets, view metadata, and explore data lineage.

  • Dataset Browser: Explore all available datasets
  • Search & Filters: Find data by name, tags, or properties
  • Metadata Viewer: Detailed information about each dataset
  • Lineage Graph: Visual representation of data flow

Pipeline Designer

Create data processing workflows using the visual pipeline designer. Drag and drop components to build complex data transformations.

  • Component Library: Pre-built processing components
  • Visual Editor: Drag-and-drop pipeline creation
  • Real-time Preview: See data as it flows through the pipeline
  • Execution Monitor: Track pipeline runs and performance

Create Your First Project

Let's create a simple data pipeline to get you started:

Step 1: Create a New Project

1. Click "New Project" in the dashboard 2. Enter project name: "My First Pipeline" 3. Select template: "Data Ingestion & Quality" 4. Click "Create Project"

Step 2: Add Data Source

Connect to your first data source:

  • Go to Data Sources → Add New
  • Choose connector type (CSV file, database, API)
  • Configure connection parameters
  • Test the connection
  • Save the data source

Step 3: Build Your Pipeline

1. Data Ingestion

Drag the "File Reader" component and configure it to read your data source.

2. Data Quality

Add quality checks like "Null Check" and "Schema Validation" components.

3. Transformation

Apply basic transformations like "Filter" or "Column Rename" if needed.

4. Output

Connect a "Data Writer" component to store the processed data.

Step 4: Run Your Pipeline

Execute your first pipeline:

1. Click "Validate Pipeline" to check for errors 2. Click "Run Pipeline" to start execution 3. Monitor progress in the execution panel 4. Check results in the output destination

Congratulations!

You've successfully created and run your first data pipeline. Check the dashboard to see your pipeline's performance metrics and quality scores.

Next Steps

Now that you're familiar with the basics, explore these advanced features:

Advanced Analytics

Learn to use SQL queries, notebooks, and machine learning capabilities.

Security & Governance

Set up user access controls, data governance policies, and compliance.

Integrations

Connect DOCC with your existing tools and data infrastructure.