The Space System provides the foundation for creating vector embeddings from different data types, offering specialized space implementations for text, images, categorical data, numbers, and temporal information. Spaces define how raw data is transformed into vector representations that enable semantic similarity calculations and multi-modal search. For information about how spaces integrate with data schemas, see Schema System. For index creation and querying, see Index and Query System.

Space Type Reference

Space TypeData InputPrimary Use CaseKey Parameters
TextSimilaritySpaceText stringsSemantic text similaritymodel, chunking_method
NumberSpaceNumerical valuesRange-based similaritymin_value, max_value, mode
CategoricalSimilaritySpaceCategory labelsDiscrete category matchingcategories, uncategorized_as_category
RecencySpaceTimestampsTime-based relevanceperiod_time_list, negative_filter
ImageSpaceImage dataVisual similaritymodel, image_size
CustomSpaceAny data typeSpecialized embeddingsCustom encoder function

Space Components Reference

Space Implementation Guide

Spaces are instantiated with schema field references and configuration parameters to create embeddings for specific data types.

Basic Space Configuration

import superlinked as sl

# Define schema
@sl.schema
class ProductSchema:
    description: sl.String
    price: sl.Float
    category: sl.String
    created_at: sl.Timestamp
    id: sl.IdField

product = ProductSchema()

# Create spaces for different data types
text_space = sl.TextSimilaritySpace(
    text=product.description,
    model="sentence-transformers/all-mpnet-base-v2"
)

number_space = sl.NumberSpace(
    number=product.price,
    min_value=0.0,
    max_value=1000.0
)

category_space = sl.CategoricalSimilaritySpace(
    category_input=product.category,
    categories=["electronics", "clothing", "books"]
)

recency_space = sl.RecencySpace(
    timestamp=product.created_at,
    period_time_list=[
        sl.PeriodTime(timedelta(days=1)),
        sl.PeriodTime(timedelta(days=7)),
        sl.PeriodTime(timedelta(days=30))
    ]
)
Space Architecture Flow
# Multi-language text space
text_space = sl.TextSimilaritySpace(
    text=product.description,
    model="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    chunking_method=sl.TextChunkingMethod.WORD,
    chunk_size=100,
    chunk_overlap=20
)

# Similarity-based number space
price_space = sl.NumberSpace(
    number=product.price,
    min_value=0.0,
    max_value=1000.0,
    mode=sl.Mode.SIMILAR
)

# Dynamic recency with multiple periods
recency_space = sl.RecencySpace(
    timestamp=product.created_at,
    period_time_list=[
        sl.PeriodTime(timedelta(hours=1)),   # Recent items
        sl.PeriodTime(timedelta(days=1)),    # Daily relevance  
        sl.PeriodTime(timedelta(days=7)),    # Weekly trends
        sl.PeriodTime(timedelta(days=30))    # Monthly patterns
    ],
    negative_filter=timedelta(days=90)  # Exclude very old items
)

# Custom space with preprocessing
custom_space = sl.CustomSpace(
    input_=product.description,
    encoder=lambda text: custom_embedding_function(text),
    dimension=512
)
Spaces can be combined within indices to create rich, multi-dimensional embeddings that capture different aspects of your data.

Index Integration

# Combine multiple spaces in an index
index = sl.Index([text_space, number_space, category_space, recency_space])

# Create query with multi-space filtering
query = (
    sl.Query(index)
    .find(product)
    .similar(text_space.text, param="search_text")
    .filter(number_space.number < param("max_price"))
    .filter(category_space.category_input == param("target_category"))
    .filter(recency_space.timestamp > param("since_date"))
    .limit(10)
)

# Execute with parameters
results = executor.query(
    query,
    search_text="wireless headphones",
    max_price=200.0,
    target_category="electronics", 
    since_date=datetime.now() - timedelta(days=30)
)
Multi-Modal Query Flow
Advanced space configurations use field sets to define complex input patterns and aggregation strategies.

Field Set Configuration

# Image space with field set
image_field_set = sl.ImageSpaceFieldSet(
    image_data=product.image,
    metadata=product.description
)

image_space = sl.ImageSpace(
    image=image_field_set,
    model="clip-vit-base-patch32"
)

# Text space with aggregation
text_field_set = sl.SpaceFieldSet(
    text=product.description,
    aggregation_mode=sl.InputAggregationMode.MEAN
)

# Custom space with complex field mapping
custom_field_set = sl.SpaceFieldSet(
    primary_field=product.description,
    secondary_field=product.category,
    aggregation_mode=sl.InputAggregationMode.CONCATENATE
)
Spaces integrate seamlessly with other Superlinked components to create complete vector search systems.

Parser and Source Integration

# Schema and parser setup
parser = sl.DataFrameParser(
    product,
    mapping={
        product.id: "product_id",
        product.description: "description_text",
        product.price: "price_value",
        product.category: "category_name",
        product.created_at: "timestamp"
    }
)

# Source configuration  
source = sl.InMemorySource(product, parser=parser)

# Application setup with spaces
app = sl.InMemoryApp(
    vector_database=sl.InMemoryVectorDatabase(),
    indices=[index],
    sources=[source]
)

Key Features

Space components provide:
  • Multi-Modal Support: Handle text, images, numbers, categories, and time data
  • Semantic Similarity: Advanced similarity calculations for each data type
  • Flexible Configuration: Customizable space parameters for optimal performance
  • Aggregation Strategies: Multiple ways to handle multi-value inputs
  • Custom Implementations: Extensible architecture for specialized embeddings
Spaces define how different types of data are transformed into vector representations. Each space type is optimized for specific data characteristics and similarity calculations.

Vector Space Concepts

Spaces handle:
  1. Data Transformation: Convert raw data into vector representations
  2. Similarity Calculation: Define how similarity is measured in the vector space
  3. Dimensionality: Control the size and complexity of embeddings
  4. Aggregation: Combine multiple values into single embeddings
  5. Normalization: Ensure vectors are properly scaled for comparison
  6. Model Integration: Interface with pre-trained models and custom encoders