Space System

The Space System provides the foundation for creating vector embeddings from different data types, offering specialized space implementations for text, images, categorical data, numbers, and temporal information. Spaces define how raw data is transformed into vector representations that enable semantic similarity calculations and multi-modal search. For information about how spaces integrate with data schemas, see Schema System. For index creation and querying, see Index and Query System.

Space Type Reference

Space Type	Data Input	Primary Use Case	Key Parameters
`TextSimilaritySpace`	Text strings	Semantic text similarity	`model`, `chunking_method`
`NumberSpace`	Numerical values	Range-based similarity	`min_value`, `max_value`, `mode`
`CategoricalSimilaritySpace`	Category labels	Discrete category matching	`categories`, `uncategorized_as_category`
`RecencySpace`	Timestamps	Time-based relevance	`period_time_list`, `negative_filter`
`ImageSpace`	Image data	Visual similarity	`model`, `image_size`
`CustomSpace`	Any data type	Specialized embeddings	Custom `encoder` function

Space Components Reference

Core Components
Embedding Space Types
Advanced Features

Foundation components for vector space definition:

Space

Base space class and core functionality

Space Field Set

Field set definitions for space configurations

Custom Space

Custom space implementations for specialized use cases

Exception

Exception handling for space operations

Space Implementation Guide

Space Definition and Usage

Spaces are instantiated with schema field references and configuration parameters to create embeddings for specific data types.

Basic Space Configuration

import superlinked as sl

# Define schema
@sl.schema
class ProductSchema:
    description: sl.String
    price: sl.Float
    category: sl.String
    created_at: sl.Timestamp
    id: sl.IdField

product = ProductSchema()

# Create spaces for different data types
text_space = sl.TextSimilaritySpace(
    text=product.description,
    model="sentence-transformers/all-mpnet-base-v2"
)

number_space = sl.NumberSpace(
    number=product.price,
    min_value=0.0,
    max_value=1000.0
)

category_space = sl.CategoricalSimilaritySpace(
    category_input=product.category,
    categories=["electronics", "clothing", "books"]
)

recency_space = sl.RecencySpace(
    timestamp=product.created_at,
    period_time_list=[
        sl.PeriodTime(timedelta(days=1)),
        sl.PeriodTime(timedelta(days=7)),
        sl.PeriodTime(timedelta(days=30))
    ]
)

Space Architecture Flow

Advanced Space Configuration

# Multi-language text space
text_space = sl.TextSimilaritySpace(
    text=product.description,
    model="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    chunking_method=sl.TextChunkingMethod.WORD,
    chunk_size=100,
    chunk_overlap=20
)

# Similarity-based number space
price_space = sl.NumberSpace(
    number=product.price,
    min_value=0.0,
    max_value=1000.0,
    mode=sl.Mode.SIMILAR
)

# Dynamic recency with multiple periods
recency_space = sl.RecencySpace(
    timestamp=product.created_at,
    period_time_list=[
        sl.PeriodTime(timedelta(hours=1)),   # Recent items
        sl.PeriodTime(timedelta(days=1)),    # Daily relevance  
        sl.PeriodTime(timedelta(days=7)),    # Weekly trends
        sl.PeriodTime(timedelta(days=30))    # Monthly patterns
    ],
    negative_filter=timedelta(days=90)  # Exclude very old items
)

# Custom space with preprocessing
custom_space = sl.CustomSpace(
    input_=product.description,
    encoder=lambda text: custom_embedding_function(text),
    dimension=512
)

Multi-Modal Space Combination

Spaces can be combined within indices to create rich, multi-dimensional embeddings that capture different aspects of your data.

Index Integration

# Combine multiple spaces in an index
index = sl.Index([text_space, number_space, category_space, recency_space])

# Create query with multi-space filtering
query = (
    sl.Query(index)
    .find(product)
    .similar(text_space.text, param="search_text")
    .filter(number_space.number < param("max_price"))
    .filter(category_space.category_input == param("target_category"))
    .filter(recency_space.timestamp > param("since_date"))
    .limit(10)
)

# Execute with parameters
results = executor.query(
    query,
    search_text="wireless headphones",
    max_price=200.0,
    target_category="electronics", 
    since_date=datetime.now() - timedelta(days=30)
)

Multi-Modal Query Flow

Space Field Sets and Configuration

Advanced space configurations use field sets to define complex input patterns and aggregation strategies.

Field Set Configuration

# Image space with field set
image_field_set = sl.ImageSpaceFieldSet(
    image_data=product.image,
    metadata=product.description
)

image_space = sl.ImageSpace(
    image=image_field_set,
    model="clip-vit-base-patch32"
)

# Text space with aggregation
text_field_set = sl.SpaceFieldSet(
    text=product.description,
    aggregation_mode=sl.InputAggregationMode.MEAN
)

# Custom space with complex field mapping
custom_field_set = sl.SpaceFieldSet(
    primary_field=product.description,
    secondary_field=product.category,
    aggregation_mode=sl.InputAggregationMode.CONCATENATE
)

Integration with Framework Components

Spaces integrate seamlessly with other Superlinked components to create complete vector search systems.

Parser and Source Integration

# Schema and parser setup
parser = sl.DataFrameParser(
    product,
    mapping={
        product.id: "product_id",
        product.description: "description_text",
        product.price: "price_value",
        product.category: "category_name",
        product.created_at: "timestamp"
    }
)

# Source configuration  
source = sl.InMemorySource(product, parser=parser)

# Application setup with spaces
app = sl.InMemoryApp(
    vector_database=sl.InMemoryVectorDatabase(),
    indices=[index],
    sources=[source]
)

Key Features

Space components provide:

Multi-Modal Support: Handle text, images, numbers, categories, and time data
Semantic Similarity: Advanced similarity calculations for each data type
Flexible Configuration: Customizable space parameters for optimal performance
Aggregation Strategies: Multiple ways to handle multi-value inputs
Custom Implementations: Extensible architecture for specialized embeddings

Spaces define how different types of data are transformed into vector representations. Each space type is optimized for specific data characteristics and similarity calculations.

Vector Space Concepts

Spaces handle:

Data Transformation: Convert raw data into vector representations
Similarity Calculation: Define how similarity is measured in the vector space
Dimensionality: Control the size and complexity of embeddings
Aggregation: Combine multiple values into single embeddings
Normalization: Ensure vectors are properly scaled for comparison
Model Integration: Interface with pre-trained models and custom encoders

Reference

Space Type Reference

Space Components Reference

Space

Space Field Set

Custom Space

Exception

Text Similarity Space

Image Space

Categorical Similarity Space

Number Space

Recency Space

Image Space Field Set

Input Aggregation Mode

Has Space Field Set

Space Implementation Guide

Basic Space Configuration

Field Set Configuration

Parser and Source Integration

Key Features

Vector Space Concepts

Reference

​Space Type Reference

​Space Components Reference

Space

Space Field Set

Custom Space

Exception

Text Similarity Space

Image Space

Categorical Similarity Space

Number Space

Recency Space

Image Space Field Set

Input Aggregation Mode

Has Space Field Set

​Space Implementation Guide

​Basic Space Configuration

​Index Integration

​Field Set Configuration

​Parser and Source Integration

​Key Features

​Vector Space Concepts

Space Type Reference

Space Components Reference

Space Implementation Guide

Basic Space Configuration

Index Integration

Field Set Configuration

Parser and Source Integration

Key Features

Vector Space Concepts