The InMemorySource class provides an in-memory implementation of the Source interface, designed for development, testing, and scenarios where data is provided programmatically. It stores data in memory and allows for real-time data ingestion and immediate vector processing.

Constructor

Create a new in-memory source with the specified schema and optional parser.
InMemorySource(schema, parser=None)

Parameters

schema
IdSchemaObjectT
required
The schema object that defines the structure of data this source will handle. All data added to this source must conform to this schema.
parser
DataParser | None
default:"None"
Optional data parser for processing input data. If None, defaults to JsonParser for handling JSON-formatted data.
Raises: InvalidInputException - If the schema is not an instance of SchemaObject.

Inheritance

The InMemorySource extends several classes to provide comprehensive functionality: Inheritance Chain:
  • InMemorySource
  • InteractiveSource
  • OnlineSource
  • TransformerPublisher
  • Source
  • Generic
This inheritance provides:
  • Base Source functionality from Source
  • Online processing capabilities from OnlineSource
  • Interactive data input from InteractiveSource
  • Event publishing from TransformerPublisher
  • Generic type support for schema types

Key Features

Memory-Based Storage

All data is stored in RAM, providing:
  • Fast Access: No disk I/O overhead for data operations
  • Immediate Processing: Data is immediately available for vector processing
  • Simple Setup: No external dependencies or database configuration

Real-Time Ingestion

Inherited from InteractiveSource:
  • Continuous Data Input: Add data while the application is running
  • Immediate Processing: Data is processed into vectors as soon as it’s added
  • Live Updates: Indices are updated in real-time with new data

Use Cases

Development and Testing

Perfect for initial development and unit testing:
from superlinked import InMemorySource, schema

@schema
class ProductSchema:
    id: str
    name: str
    description: str
    price: float

product_schema = ProductSchema()

# Create in-memory source
source = InMemorySource(product_schema)

# Add test data
test_products = [
    {"id": "1", "name": "Laptop", "description": "Gaming laptop", "price": 999.99},
    {"id": "2", "name": "Mouse", "description": "Wireless mouse", "price": 29.99}
]

source.put(test_products)

Rapid Prototyping

Ideal for quick experimentation and proof-of-concepts:
# Quick prototype setup
source = InMemorySource(document_schema)

# Add sample documents
sample_docs = [
    {"id": "doc1", "title": "AI Overview", "content": "Introduction to AI concepts"},
    {"id": "doc2", "title": "ML Basics", "content": "Machine learning fundamentals"}
]

source.put(sample_docs)

# Immediately available for vector search testing

Interactive Development

Great for Jupyter notebooks and interactive development:
# Start with empty source
source = InMemorySource(article_schema)

# Add data incrementally during development
source.put({"id": "1", "title": "First Article", "content": "Content here"})

# Test search functionality
# ... run queries ...

# Add more data as needed
source.put({"id": "2", "title": "Second Article", "content": "More content"})

Demo Applications

Excellent for demonstrations and training:
# Demo setup with realistic data
demo_source = InMemorySource(movie_schema)

# Load demo dataset
demo_movies = load_demo_movie_data()  # Your demo data function
demo_source.put(demo_movies)

# Ready for live demonstration with pre-loaded data

Data Management

Programmatic Data Input

Data is added programmatically through the put() method inherited from InteractiveSource:
  • Single Items: Add individual data records
  • Batch Input: Add multiple records at once
  • Continuous Updates: Add data while the application is running
  • Real-Time Processing: Data is immediately processed and available for search

Memory Considerations

Memory Usage: All data is stored in RAM. Monitor memory usage with large datasets to prevent out-of-memory errors.
Data Size Limits: Keep datasets reasonably sized (typically under 1GB) for optimal performance in development scenarios.

Data Persistence

No Persistence: Data is lost when the application shuts down. Not suitable for production use cases requiring data durability.

Performance Characteristics

Advantages

  • Speed: Fastest possible data access and processing
  • Simplicity: No external dependencies or setup required
  • Flexibility: Easy to modify and test with different datasets

Limitations

  • Memory Constraints: Limited by available RAM
  • No Persistence: Data doesn’t survive application restarts
  • Single Instance: Cannot share data across multiple application instances

Best Practices

Development Workflow

Incremental Development: Start with small datasets in InMemorySource, then migrate to production sources when ready for deployment.

Testing Strategy

Isolated Tests: Each test should create its own InMemorySource instance to ensure test isolation and prevent data contamination.

Data Management

Schema Validation: Always validate your test data against the schema before adding to the source to catch schema mismatches early.

Integration Example

from superlinked import (
    InMemorySource, InMemoryApp, Index, 
    TextSimilaritySpace, InMemoryVectorDatabase
)

# Complete development setup
@schema
class DocumentSchema:
    id: str
    title: str
    content: str
    category: str

document_schema = DocumentSchema()

# Create source and other components
source = InMemorySource(document_schema)
text_space = TextSimilaritySpace(text=document_schema.content)
index = Index([text_space])
vector_db = InMemoryVectorDatabase()

# Create application
app = InMemoryApp(
    sources=[source],
    indices=[index],
    vector_database=vector_db
)

# Add data for development
development_docs = [
    {
        "id": "1", 
        "title": "Getting Started", 
        "content": "Introduction to our system",
        "category": "tutorial"
    },
    {
        "id": "2", 
        "title": "Advanced Features", 
        "content": "Deep dive into advanced functionality",
        "category": "guide"
    }
]

source.put(development_docs)

# Ready for development and testing

Migration Path

When transitioning from development to production:
# Development with InMemorySource
dev_source = InMemorySource(schema)

# Production with RestSource
prod_source = RestSource(schema, parser=custom_parser)

# Same application logic works with both sources
app = ProductionApp(sources=[prod_source], indices=[index])