InteractiveSource
class provides an interactive implementation of the Source interface that allows real-time data ingestion during application runtime. It’s designed for scenarios where you need to add data continuously while the application is running and have it immediately available for search.
Constructor
Create a new interactive source with the specified schema and optional parser.Parameters
The schema object that defines the structure of data this source will handle. All data added to this source must conform to this schema.
Optional data parser for processing input data. If None, defaults to JsonParser for handling JSON-formatted data.
InvalidInputException
- If the schema is not an instance of SchemaObject.
Inheritance
TheInteractiveSource
extends several classes to provide comprehensive functionality:
Inheritance Chain:
InteractiveSource
- →
OnlineSource
- →
TransformerPublisher
- →
Source
- →
Generic
Descendants
- InMemorySource - Extends InteractiveSource with in-memory storage
Methods
put()
Add data to the InteractiveSource for immediate processing and indexing.The data to add to the source. Can be a single data item or a sequence of items. Data must conform to the source’s schema.
- Schema validation
- Data parsing (if parser is configured)
- Vector generation through associated spaces
- Index updates for immediate search availability
allow_data_ingestion()
Enable data ingestion for this source.put()
method. Called automatically during source initialization in most cases.
Key Features
Real-Time Processing
- Immediate Availability: Data becomes searchable as soon as it’s added
- Live Updates: Indices are updated in real-time without requiring restarts
- Continuous Operation: Add data while the application is actively serving queries
Interactive Development
- Incremental Testing: Add data piece by piece and test search results immediately
- Development Flexibility: Modify datasets during development without restarting
- Rapid Iteration: Quick feedback loop for testing different data scenarios
Use Cases
Interactive Development Environments
Perfect for Jupyter notebooks and interactive development:Live Data Streaming
Real-time data ingestion from streams or APIs:A/B Testing and Experimentation
Dynamic content testing without restarts:Training and Education
Interactive demonstrations and tutorials:Performance Characteristics
Advantages
- Real-Time Updates: Immediate data availability without batch processing delays
- Development Speed: Fast iteration cycles for testing and development
- Flexibility: Dynamic data modification during runtime
Considerations
- Processing Overhead: Each
put()
operation includes full processing pipeline - Memory Usage: Data accumulates in memory over time
- Synchronous Processing:
put()
blocks until processing is complete
Best Practices
Data Ingestion Patterns
Batch When Possible: For multiple items, use a single
put()
call with a list rather than multiple individual calls to improve performance.Error Handling
Schema Validation: Always validate data against the schema before calling
put()
to prevent processing failures.Memory Management
Memory Monitoring: Monitor memory usage when continuously adding data, especially in long-running applications.
Development Workflow
Incremental Testing: Use InteractiveSource to build up test datasets incrementally and verify search behavior at each step.
Integration Pattern
InteractiveSource integrates seamlessly with the Superlinked pipeline:- Data Input: Call
put()
with new data - Schema Validation: Data is validated against the associated schema
- Parsing: Optional parser processes the data format
- Vector Generation: Data flows through associated spaces for embedding
- Index Updates: All relevant indices are updated immediately
- Search Availability: Data becomes immediately searchable
Error Handling
Common error scenarios and handling:- Schema Mismatch: Ensure data structure matches the schema exactly
- Parser Errors: Validate data format if using custom parsers
- Memory Limits: Monitor memory usage for long-running applications
- Processing Failures: Handle vectorization errors gracefully
Comparison with Other Sources
vs InMemorySource
- InteractiveSource: Base class focused on real-time ingestion
- InMemorySource: Extends InteractiveSource with memory-specific optimizations
vs RestSource
- InteractiveSource: Programmatic data addition via
put()
method - RestSource: Data ingestion through HTTP REST endpoints
vs DataLoaderSource
- InteractiveSource: Real-time, incremental data addition
- DataLoaderSource: Batch loading from files and external sources