Schema
Inherit your schema class from this class to use as a schema that can be used to represent your structured data.
Schemas translate to entities in the embedding space that you can search by or search for.
Ancestors (in MRO)
- superlinked.framework.common.schema.id_schema_object.IdSchemaObject
- abc.ABC
Usage
Basic Schema Definition
Define a simple schema for your data structure:
from superlinked import Schema
class ProductSchema(Schema):
id: str
name: str
description: str
price: float
category: str
in_stock: bool
# Create an instance to use in your application
product_schema = ProductSchema()
Schema with Complex Types
Use more complex type annotations for richer data structures:
from datetime import datetime
from typing import List, Optional
from superlinked import Schema
class UserSchema(Schema):
user_id: str
email: str
name: str
age: Optional[int]
tags: List[str]
created_at: datetime
is_active: bool
user_schema = UserSchema()
Event-Based Schema
Create schemas for time-series or event data:
from superlinked import Schema
class InteractionSchema(Schema):
user_id: str
item_id: str
interaction_type: str
timestamp: datetime
rating: Optional[float]
interaction_schema = InteractionSchema()
Schema Integration
With Spaces
Use schemas to define vector spaces:
from superlinked import TextSimilaritySpace, CategoricalSimilaritySpace
# Create spaces based on schema fields
text_space = TextSimilaritySpace(
text=product_schema.description,
model="sentence-transformers/all-MiniLM-L6-v2"
)
category_space = CategoricalSimilaritySpace(
category_input=product_schema.category,
categories=["electronics", "clothing", "books"]
)
With Data Sources
Connect schemas to data sources:
from superlinked import InMemorySource
# Create a data source for the schema
source = InMemorySource(product_schema)
With Indexes
Organize schemas in indexes for querying:
from superlinked import Index
# Create an index combining multiple spaces
product_index = Index([text_space, category_space])
Schema Properties
Once decorated, your schema gains several important capabilities:
Entity Representation
- Embedding Space Entities: Each schema instance represents an entity type in the vector space
- Searchable Units: You can search for entities of this schema type or use them as search criteria
- Type Safety: The schema ensures data consistency and type validation
Field Access
- Schema Fields: Access individual fields for use in spaces and queries
- Type Information: Maintain type safety throughout the pipeline
- Validation: Automatic validation of data against the schema structure
Best Practices
Clear Naming: Use descriptive class and field names that clearly represent
your data domain. This improves code readability and makes debugging easier.
Type Annotations: Always include proper type annotations for all fields.
This enables better validation and IDE support.
Required Fields: Mark optional fields explicitly with Optional[]
or | None
. All other fields are considered required.
Schema Evolution: Changes to schema definitions may require rebuilding
indexes and reprocessing data. Plan schema changes carefully in production
environments.
Common Patterns
Multiple Schema Relationships
Define related schemas for complex data models:
from superlinked import Schema
class AuthorSchema(Schema):
author_id: str
name: str
biography: str
class BookSchema(Schema):
book_id: str
title: str
author_id: str # References AuthorSchema
isbn: str
publication_year: int
author_schema = AuthorSchema()
book_schema = BookSchema()
Hierarchical Data
Structure schemas for hierarchical or nested data:
from superlinked import Schema
from typing import Optional
class CategorySchema(Schema):
category_id: str
name: str
parent_category_id: Optional[str]
class ProductSchema(Schema):
product_id: str
name: str
category_id: str # References CategorySchema
subcategory_id: Optional[str]