Typed data structure definitions that serve as contracts for data flowing through the Superlinked framework, with support for field types, validation, and event-based modeling
The Schema System provides typed data structure definitions that serve as contracts for data flowing through the Superlinked framework. Schemas define the structure, field types, and relationships of entities processed by spaces, indices, and queries, ensuring type safety and data integrity throughout the pipeline.For information about how schemas integrate with vector embeddings, see Space System. For query definition and execution, see Index and Query System.
The schema system supports several built-in field types for different data categories, each optimized for specific data characteristics and space integration.
# Schema with comprehensive field types@sl.schemaclass ProductSchema: description: sl.String # For TextSimilaritySpace price: sl.Float # For NumberSpace category: sl.String # For CategoricalSimilaritySpace created_at: sl.Timestamp # For RecencySpace id: sl.IdField # For entity identificationproduct = ProductSchema()# Field access in spacestext_space = sl.TextSimilaritySpace( text=product.description, model="sentence-transformers/all-mpnet-base-v2")number_space = sl.NumberSpace( number=product.price, min_value=0.0, max_value=1000.0)recency_space = sl.RecencySpace( timestamp=product.created_at, period_time_list=[ sl.PeriodTime(timedelta(days=1)), sl.PeriodTime(timedelta(days=7)), sl.PeriodTime(timedelta(days=30)) ])
Schema Instantiation and Pipeline Integration
Schemas are instantiated to create pipeline objects that reference specific data entities and their fields, enabling type-safe data processing throughout the framework.
# Schema instantiationparagraph = ParagraphSchema()# Field access in spacestext_space = sl.TextSimilaritySpace( text=paragraph.body, model="sentence-transformers/all-mpnet-base-v2")# Field access in data parsingparser = sl.DataFrameParser( paragraph, mapping={ paragraph.id: "index", paragraph.created_at: "creation_date", paragraph.body: "text_content" })# Index creation with multiple spacesindex = sl.Index([text_space, recency_space])# Application setupapp = sl.InMemoryApp( vector_database=sl.InMemoryVectorDatabase(), indices=[index], sources=[sl.InMemorySource(paragraph, parser=parser)])
Data Parsing and Mapping
Schemas integrate with data sources through parsing mechanisms that map external data formats to schema fields, ensuring consistent data flow and type safety.
Event schemas capture behavioral data and interactions that can modify entity embeddings over time through the Event Effects System, enabling dynamic and adaptive recommendations.
The schema system provides compile-time and runtime type safety through Python type annotations and framework validation, ensuring data integrity and catching errors early in development.
# Type-safe field accessclass ProductSchema(sl.Schema): name: sl.String price: sl.Float id: sl.IdFieldproduct = ProductSchema()# product.name is recognized as sl.String by type checkers# product.price is recognized as sl.Float by type checkers
# Automatic type validationproduct = ProductSchema()# Framework validates field types at runtimetry: product.price = "invalid_price" # Raises validation errorexcept InvalidInputException as e: print(f"Type validation failed: {e}")# IDE support for field accessproduct.name. # IDE shows string methodsproduct.price. # IDE shows float methods
Type Safety: Strong typing and validation for all data fields with Python type annotation integration
Flexibility: Support for various data types and structures with both inheritance and decorator patterns
Event Handling: Specialized schemas for time-based event data and behavioral analytics
Data Integration: Seamless parsing and mapping from external data sources like DataFrames and JSON
Framework Integration: Native support for spaces, indices, and query operations
Validation: Automatic data validation and error handling at both compile-time and runtime
Schema definitions serve as the foundation for all data processing operations in Superlinked. Properly defined schemas ensure optimal performance, data integrity, and type safety throughout the entire pipeline.