The Space System provides the foundation for creating vector embeddings from different data types, offering specialized space implementations for text, images, categorical data, numbers, and temporal information. Spaces define how raw data is transformed into vector representations that enable semantic similarity calculations and multi-modal search. For information about how spaces integrate with data schemas, see Schema System. For index creation and querying, see Index and Query System.

Space Type Reference

Space TypeData InputPrimary Use CaseKey Parameters
TextSimilaritySpaceText stringsSemantic text similaritymodel, chunking_method
NumberSpaceNumerical valuesRange-based similaritymin_value, max_value, mode
CategoricalSimilaritySpaceCategory labelsDiscrete category matchingcategories, uncategorized_as_category
RecencySpaceTimestampsTime-based relevanceperiod_time_list, negative_filter
ImageSpaceImage dataVisual similaritymodel, image_size
CustomSpaceAny data typeSpecialized embeddingsCustom encoder function

Space Components Reference

Space Implementation Guide

Key Features

Space components provide:
  • Multi-Modal Support: Handle text, images, numbers, categories, and time data
  • Semantic Similarity: Advanced similarity calculations for each data type
  • Flexible Configuration: Customizable space parameters for optimal performance
  • Aggregation Strategies: Multiple ways to handle multi-value inputs
  • Custom Implementations: Extensible architecture for specialized embeddings
Spaces define how different types of data are transformed into vector representations. Each space type is optimized for specific data characteristics and similarity calculations.

Vector Space Concepts

Spaces handle:
  1. Data Transformation: Convert raw data into vector representations
  2. Similarity Calculation: Define how similarity is measured in the vector space
  3. Dimensionality: Control the size and complexity of embeddings
  4. Aggregation: Combine multiple values into single embeddings
  5. Normalization: Ensure vectors are properly scaled for comparison
  6. Model Integration: Interface with pre-trained models and custom encoders