Overview
Data Hub
Purpose: Centralize and streamline data ingestion, processing, and management for seamless integration and actionable insights.
Data Connectors
- Seamless Integration: Import and export data from enterprise tools such as databases, CRM systems, and ERP platforms.
- Data Consistency: Keep your enterprise data synchronized across all sources.
Document Processing
- Unstructured Data Extraction: Parse and process documents like PDFs, images, and scanned files.
- Automation: Enhance workflows with intelligent data extraction capabilities.
Dataset Management
- Effortless Organization: Upload, group, and manage datasets efficiently.
- Flexible Formats: Support for CSV, Delta, and more for logical categorization.
Automation Hub
Purpose: Streamline the design and execution of workflows with advanced automation capabilities, enabling scalable and efficient data and computational processes.
Transform Node Workflows
- Versatile Data Processing: Filter, transform, enrich, and aggregate data beyond traditional SQL capabilities.
- Complex Data Manipulations: Handle reshaping, validation, and multi-source data integration.
- Advanced Features: Includes partitioning, ranking, dynamic enrichment, and support for batch and real-time processing.
- Key Benefits: Streamline ETL pipelines, improve data quality, and drive scalable analytics.
Custom Code Node Workflows
- Custom Logic Integration: Execute tailored workflows with configurable custom code nodes.
- Seamless Development: Leverage an integrated VS Code server for code creation and version control.
- Automated Builds: Trigger Docker builds managed by GitHub Actions for efficiency and reliability.
Compute Node Workflow
- Document Processing Power: Automate document analysis with advanced machine learning techniques.
- Core Features: Perform classification, feature extraction, and embedding generation for structured and unstructured data.
- Optimized IDP: Accelerate Intelligent Document Processing with scalable workflows.
Spark Node Workflows
- Distributed Computing: Harness Apache Spark for high-performance batch and stream processing.
- Big Data Analytics: Enable ETL, machine learning (via MLlib), graph analytics (GraphX), and SQL queries.
- In-Memory Speed: Process structured, semi-structured, and unstructured data rapidly.
- Real-Time Insights: Power data science, real-time analytics, and robust data integration workflows.
Developer Hub
Purpose: Equip developers with tools for advanced data science and machine learning operations.
Notebooks
- Develop and test machine learning models.
- Share and manage Jupyter notebooks for collaborative projects.
MLOps
- Register, Monitor, Analyze and Optimize ML models.