InMemorySource
class provides an in-memory implementation of the Source interface, designed for development, testing, and scenarios where data is provided programmatically. It stores data in memory and allows for real-time data ingestion and immediate vector processing.
Constructor
Create a new in-memory source with the specified schema and optional parser.Parameters
The schema object that defines the structure of data this source will handle. All data added to this source must conform to this schema.
Optional data parser for processing input data. If None, defaults to JsonParser for handling JSON-formatted data.
InvalidInputException
- If the schema is not an instance of SchemaObject.
Inheritance
TheInMemorySource
extends several classes to provide comprehensive functionality:
Inheritance Chain:
InMemorySource
- →
InteractiveSource
- →
OnlineSource
- →
TransformerPublisher
- →
Source
- →
Generic
- Base Source functionality from
Source
- Online processing capabilities from
OnlineSource
- Interactive data input from
InteractiveSource
- Event publishing from
TransformerPublisher
- Generic type support for schema types
Key Features
Memory-Based Storage
All data is stored in RAM, providing:- Fast Access: No disk I/O overhead for data operations
- Immediate Processing: Data is immediately available for vector processing
- Simple Setup: No external dependencies or database configuration
Real-Time Ingestion
Inherited fromInteractiveSource
:
- Continuous Data Input: Add data while the application is running
- Immediate Processing: Data is processed into vectors as soon as it’s added
- Live Updates: Indices are updated in real-time with new data
Use Cases
Development and Testing
Perfect for initial development and unit testing:Rapid Prototyping
Ideal for quick experimentation and proof-of-concepts:Interactive Development
Great for Jupyter notebooks and interactive development:Demo Applications
Excellent for demonstrations and training:Data Management
Programmatic Data Input
Data is added programmatically through theput()
method inherited from InteractiveSource
:
- Single Items: Add individual data records
- Batch Input: Add multiple records at once
- Continuous Updates: Add data while the application is running
- Real-Time Processing: Data is immediately processed and available for search
Memory Considerations
Memory Usage: All data is stored in RAM. Monitor memory usage with large datasets to prevent out-of-memory errors.
Data Size Limits: Keep datasets reasonably sized (typically under 1GB) for optimal performance in development scenarios.
Data Persistence
No Persistence: Data is lost when the application shuts down. Not suitable for production use cases requiring data durability.
Performance Characteristics
Advantages
- Speed: Fastest possible data access and processing
- Simplicity: No external dependencies or setup required
- Flexibility: Easy to modify and test with different datasets
Limitations
- Memory Constraints: Limited by available RAM
- No Persistence: Data doesn’t survive application restarts
- Single Instance: Cannot share data across multiple application instances
Best Practices
Development Workflow
Incremental Development: Start with small datasets in InMemorySource, then migrate to production sources when ready for deployment.
Testing Strategy
Isolated Tests: Each test should create its own InMemorySource instance to ensure test isolation and prevent data contamination.
Data Management
Schema Validation: Always validate your test data against the schema before adding to the source to catch schema mismatches early.