Source - Superlinked

The Source class serves as the abstract foundation for all data source implementations in Superlinked. It defines the interface for providing data to vector indices for processing and search operations.

Constructor

Source()

This is an abstract base class and cannot be instantiated directly. Use concrete implementations for specific data source types.

Inheritance Hierarchy

The Source class serves as the base for all data source implementations: Inheritance Chain: Source → ABC

Available Implementations

Interactive Sources

InMemorySource - In-memory data storage for development and testing
InteractiveSource - Interactive data input with real-time ingestion

Production Sources

RestSource - REST API data source for production deployments
DataLoaderSource - Batch data loading from files and external systems

Source Types and Use Cases

Development Sources

InMemorySource: Perfect for development, testing, and prototyping where data is provided programmatically and stored in memory. InteractiveSource: Ideal for interactive development environments where you want to add data incrementally and see immediate results.

Production Sources

RestSource: Designed for production applications where data comes through REST API endpoints, enabling real-time data ingestion. DataLoaderSource: Suitable for batch processing scenarios where data is loaded from files (CSV, JSON, Parquet) or external data sources.

Data Flow Architecture

Sources integrate into the Superlinked pipeline as the entry point for data:

Data Input: Sources receive raw data from various inputs (APIs, files, memory)
Parsing: Data is processed through associated parsers to match schema requirements
Validation: Schema validation ensures data conformity
Transformation: Data flows to vector spaces for embedding generation
Indexing: Processed vectors are stored in indices for searching

Source Selection Guide

For Development

InMemorySource: Use when you want to quickly test with small datasets and don’t need persistence. Perfect for tutorials and experimentation.

For Interactive Development

InteractiveSource: Use when building and testing incrementally, allowing you to add data on-the-fly and observe results immediately.

For Production APIs

RestSource: Use for production web applications where data is ingested through HTTP endpoints. Provides scalable real-time data processing.

For Batch Processing

DataLoaderSource: Use for ETL pipelines and batch processing scenarios where you need to load large datasets from files or databases.

Integration Pattern

All source implementations follow a consistent pattern:

Schema Association: Each source is associated with a specific schema that defines the data structure
Parser Configuration: Sources can use custom parsers to handle different data formats
Data Validation: Incoming data is validated against the associated schema
Event Publishing: Sources publish data events to connected indices and processing components

Best Practices

Schema Design

Schema Consistency: Ensure your data source consistently provides data that matches the associated schema structure. Mismatched data will cause processing failures.

Performance Considerations

Batch Processing: For large datasets, consider using DataLoaderSource with appropriate batch sizes to optimize memory usage and processing performance.

Error Handling

Data Validation: Implement proper error handling for data validation failures, especially in production environments where data quality may vary.

Data Processing Pipeline

Sources work together with other Superlinked components:

Schemas: Define the structure of data that sources must provide
Parsers: Transform raw data from sources into schema-compliant format
Spaces: Convert parsed data into vector representations
Indices: Organize and store vectors for efficient searching
Queries: Search through the indexed data provided by sources

The abstract nature of the Source class ensures consistent behavior across different data input methods while allowing each implementation to optimize for its specific use case and data patterns.

Reference

​Constructor

​Inheritance Hierarchy

​Available Implementations

​Source Types and Use Cases

​Development Sources

​Production Sources

​Data Flow Architecture

​Source Selection Guide

​For Development

​For Interactive Development

​For Production APIs

​For Batch Processing

​Integration Pattern

​Best Practices

​Schema Design

​Performance Considerations

​Error Handling

​Data Processing Pipeline

Constructor

Inheritance Hierarchy

Available Implementations

Source Types and Use Cases

Development Sources

Production Sources

Data Flow Architecture

Source Selection Guide

For Development

For Interactive Development

For Production APIs

For Batch Processing

Integration Pattern

Best Practices

Schema Design

Performance Considerations

Error Handling

Data Processing Pipeline