Overview

Data Hub

Purpose: Centralize and streamline data ingestion, processing, and management for seamless integration and actionable insights.

Data Connectors

Seamless Integration: Import and export data from enterprise tools such as databases, CRM systems, and ERP platforms.
Data Consistency: Keep your enterprise data synchronized across all sources.

Document Processing

Unstructured Data Extraction: Parse and process documents like PDFs, images, and scanned files.
Automation: Enhance workflows with intelligent data extraction capabilities.

Dataset Management

Effortless Organization: Upload, group, and manage datasets efficiently.
Flexible Formats: Support for CSV, Delta, and more for logical categorization.

Automation Hub

Purpose: Streamline the design and execution of workflows with advanced automation capabilities, enabling scalable and efficient data and computational processes.

Transform Node Workflows

Versatile Data Processing: Filter, transform, enrich, and aggregate data beyond traditional SQL capabilities.
Complex Data Manipulations: Handle reshaping, validation, and multi-source data integration.
Advanced Features: Includes partitioning, ranking, dynamic enrichment, and support for batch and real-time processing.
Key Benefits: Streamline ETL pipelines, improve data quality, and drive scalable analytics.

Custom Code Node Workflows

Custom Logic Integration: Execute tailored workflows with configurable custom code nodes.
Seamless Development: Leverage an integrated VS Code server for code creation and version control.
Automated Builds: Trigger Docker builds managed by GitHub Actions for efficiency and reliability.

Compute Node Workflow

Document Processing Power: Automate document analysis with advanced machine learning techniques.
Core Features: Perform classification, feature extraction, and embedding generation for structured and unstructured data.
Optimized IDP: Accelerate Intelligent Document Processing with scalable workflows.

Spark Node Workflows

Distributed Computing: Harness Apache Spark for high-performance batch and stream processing.
Big Data Analytics: Enable ETL, machine learning (via MLlib), graph analytics (GraphX), and SQL queries.
In-Memory Speed: Process structured, semi-structured, and unstructured data rapidly.
Real-Time Insights: Power data science, real-time analytics, and robust data integration workflows.

Developer Hub

Purpose: Equip developers with tools for advanced data science and machine learning operations.

Notebooks

Develop and test machine learning models.
Share and manage Jupyter notebooks for collaborative projects.