Schema

Schema()
Inherit your schema class from this class to use as a schema that can be used to represent your structured data. Schemas translate to entities in the embedding space that you can search by or search for.

Ancestors (in MRO)

  • superlinked.framework.common.schema.id_schema_object.IdSchemaObject
  • abc.ABC

Usage

Basic Schema Definition

Define a simple schema for your data structure:
from superlinked import Schema

class ProductSchema(Schema):
    id: str
    name: str
    description: str
    price: float
    category: str
    in_stock: bool

# Create an instance to use in your application
product_schema = ProductSchema()

Schema with Complex Types

Use more complex type annotations for richer data structures:
from datetime import datetime
from typing import List, Optional
from superlinked import Schema

class UserSchema(Schema):
    user_id: str
    email: str
    name: str
    age: Optional[int]
    tags: List[str]
    created_at: datetime
    is_active: bool

user_schema = UserSchema()

Event-Based Schema

Create schemas for time-series or event data:
from superlinked import Schema

class InteractionSchema(Schema):
    user_id: str
    item_id: str
    interaction_type: str
    timestamp: datetime
    rating: Optional[float]

interaction_schema = InteractionSchema()

Schema Integration

With Spaces

Use schemas to define vector spaces:
from superlinked import TextSimilaritySpace, CategoricalSimilaritySpace

# Create spaces based on schema fields
text_space = TextSimilaritySpace(
    text=product_schema.description,
    model="sentence-transformers/all-MiniLM-L6-v2"
)

category_space = CategoricalSimilaritySpace(
    category_input=product_schema.category,
    categories=["electronics", "clothing", "books"]
)

With Data Sources

Connect schemas to data sources:
from superlinked import InMemorySource

# Create a data source for the schema
source = InMemorySource(product_schema)

With Indexes

Organize schemas in indexes for querying:
from superlinked import Index

# Create an index combining multiple spaces
product_index = Index([text_space, category_space])

Schema Properties

Once decorated, your schema gains several important capabilities:

Entity Representation

  • Embedding Space Entities: Each schema instance represents an entity type in the vector space
  • Searchable Units: You can search for entities of this schema type or use them as search criteria
  • Type Safety: The schema ensures data consistency and type validation

Field Access

  • Schema Fields: Access individual fields for use in spaces and queries
  • Type Information: Maintain type safety throughout the pipeline
  • Validation: Automatic validation of data against the schema structure

Best Practices

Clear Naming: Use descriptive class and field names that clearly represent your data domain. This improves code readability and makes debugging easier.
Type Annotations: Always include proper type annotations for all fields. This enables better validation and IDE support.
Required Fields: Mark optional fields explicitly with Optional[] or | None. All other fields are considered required.
Schema Evolution: Changes to schema definitions may require rebuilding indexes and reprocessing data. Plan schema changes carefully in production environments.

Common Patterns

Multiple Schema Relationships

Define related schemas for complex data models:
from superlinked import Schema

class AuthorSchema(Schema):
    author_id: str
    name: str
    biography: str

class BookSchema(Schema):
    book_id: str
    title: str
    author_id: str  # References AuthorSchema
    isbn: str
    publication_year: int

author_schema = AuthorSchema()
book_schema = BookSchema()

Hierarchical Data

Structure schemas for hierarchical or nested data:
from superlinked import Schema
from typing import Optional

class CategorySchema(Schema):
    category_id: str
    name: str
    parent_category_id: Optional[str]

class ProductSchema(Schema):
    product_id: str
    name: str
    category_id: str  # References CategorySchema
    subcategory_id: Optional[str]