Space - Superlinked

The Space class serves as the abstract foundation for all vector space implementations in Superlinked. It defines the interface for transforming data into vector representations that enable similarity search and retrieval operations.

Constructor

Create a new space with the specified fields and type configuration.

Space(fields, type_)

Parameters

fields

Sequence[SchemaField]

required

The sequence of schema fields that this space will process. These fields define what data will be transformed into vectors.

type_

type | TypeAlias

required

The type specification for the space, defining the expected input and output data types for the vector transformation.

The Space class is abstract and cannot be instantiated directly. Use concrete implementations like TextSimilaritySpace, CategoricalSimilaritySpace, or NumberSpace.

Properties

allow_similar_clause

allow_similar_clause: bool

Indicates whether this space supports similarity-based query clauses. When True, the space can be used in similarity searches and “looks like” queries.

annotation

annotation: str

A string annotation that describes the space configuration and purpose. Used for debugging and introspection.

length

length: int

The dimensionality of the vector space - the number of dimensions in the resulting vectors. This is determined by the specific space implementation and its configuration.

Space Types

Superlinked provides several specialized space implementations for different data types:

Text Processing

TextSimilaritySpace - For semantic text similarity using embedding models

Categorical Data

CategoricalSimilaritySpace

For categorical data with predefined categories

Numerical Data

NumberSpace - For numerical data with min-max or similarity-based transformations

Time-Based Data

RecencySpace - For time-based decay and recency scoring

Images

ImageSpace - For image similarity using vision models

Custom Transformations

CustomSpace - For custom vector transformations

Vector Transformation Pipeline

Data Flow

Input Processing: Raw data from schema fields is ingested
Type Validation: Data types are validated against the space configuration
Transformation: Data is transformed into vector representations
Normalization: Vectors are normalized according to space requirements
Output: Standardized vectors ready for indexing and similarity search

Example Workflow

from superlinked import TextSimilaritySpace, schema

@schema
class DocumentSchema:
    id: str
    title: str
    content: str

document_schema = DocumentSchema()

# Create a text similarity space
text_space = TextSimilaritySpace(
    text=document_schema.content,
    model="sentence-transformers/all-MiniLM-L6-v2"
)

# The space will transform text content into 384-dimensional vectors
print(f"Vector dimensions: {text_space.length}")  # 384
print(f"Supports similarity queries: {text_space.allow_similar_clause}")  # True

Design Patterns

Composition Pattern

Spaces can be combined in indexes for multi-dimensional similarity:

# Multiple spaces for different aspects of data
text_space = TextSimilaritySpace(text=product_schema.description)
category_space = CategoricalSimilaritySpace(
    category_input=product_schema.category,
    categories=["electronics", "clothing", "books"]
)
price_space = NumberSpace(number=product_schema.price, min_value=0, max_value=1000)

# Combine in an index
product_index = Index([text_space, category_space, price_space])

Strategy Pattern

Different space types implement the same interface with different strategies:

TextSimilaritySpace: Uses embedding models for semantic similarity
CategoricalSimilaritySpace: Uses one-hot encoding or learned embeddings
NumberSpace: Uses normalization and binning strategies

Interface Contracts

HasTransformationConfig

Provides configuration for how data is transformed into vectors:

transformation_config: TransformationConfig

HasLength

Defines the dimensionality of the resulting vectors:

length: int

HasSpaceFieldSet (for some implementations)

Manages the fields and their processing within the space:

space_field_set: SpaceFieldSet

Use Cases

Semantic Search

Create spaces for finding semantically similar content:

# Semantic text search
content_space = TextSimilaritySpace(
    text=article_schema.content,
    model="sentence-transformers/all-mpnet-base-v2"
)

Combine different data types for comprehensive search:

# Product search combining text, category, and price
product_spaces = [
    TextSimilaritySpace(text=product_schema.description),
    CategoricalSimilaritySpace(
        category_input=product_schema.category,
        categories=category_list
    ),
    NumberSpace(number=product_schema.price)
]

Recommendation Systems

Build recommendation engines using multiple signal types:

# User preference modeling
user_spaces = [
    CategoricalSimilaritySpace(
        category_input=user_schema.preferences,
        categories=preference_categories
    ),
    RecencySpace(timestamp=interaction_schema.timestamp),
    NumberSpace(number=user_schema.age)
]

Best Practices

Space Selection: Choose the appropriate space type based on your data characteristics. Use TextSimilaritySpace for unstructured text, CategoricalSimilaritySpace for discrete categories, and NumberSpace for continuous numerical values.

Dimensionality: Consider the trade-off between vector dimensionality and performance. Higher dimensions can capture more nuanced relationships but require more computational resources.

Type Consistency: Ensure your schema field types match the expected input types for your chosen space. Type mismatches will cause runtime errors.

Combination Strategy: When using multiple spaces in an index, consider how they will be combined. Different space types may require different weighting strategies for optimal results.

Advanced Configuration

Custom Transformations

For specialized use cases, implement custom transformation logic:

class CustomProductSpace(Space):
    def __init__(self, fields):
        super().__init__(fields, ProductVector)

    def transform(self, data):
        # Custom transformation logic
        return custom_vector_transform(data)

Performance Optimization

Configure spaces for optimal performance:

# Optimize text space for large-scale processing
text_space = TextSimilaritySpace(
    text=document_schema.content,
    model="sentence-transformers/all-MiniLM-L6-v2",
    cache_size=50000,  # Larger cache for better performance
    model_cache_dir="/path/to/model/cache"
)

Reference

​Constructor

​Parameters

​Properties

​allow_similar_clause

​annotation

​length

​Space Types

​Vector Transformation Pipeline

​Data Flow

​Example Workflow

​Design Patterns

​Composition Pattern

​Strategy Pattern

​Interface Contracts

​HasTransformationConfig

​HasLength

​HasSpaceFieldSet (for some implementations)

​Use Cases

​Semantic Search

​Multi-Modal Search

​Recommendation Systems

​Best Practices

​Advanced Configuration

​Custom Transformations

​Performance Optimization

Constructor

Parameters

Properties

allow_similar_clause

annotation

length

Space Types

Vector Transformation Pipeline

Data Flow

Example Workflow

Design Patterns

Composition Pattern

Strategy Pattern

Interface Contracts

HasTransformationConfig

HasLength

HasSpaceFieldSet (for some implementations)

Use Cases

Semantic Search

Multi-Modal Search

Recommendation Systems

Best Practices

Advanced Configuration

Custom Transformations

Performance Optimization