LogoLogo
👋 Get in touch⭐️ GitHub
  • Welcome
  • Getting Started
    • Why Superlinked?
    • Setup Superlinked
    • Basic Building Blocks
  • Run in Production
    • Overview
    • Setup Superlinked Server
      • Configuring your app
      • Interacting with app via API
    • Supported Vector Databases
      • Redis
      • Mongo DB
      • Qdrant
  • Concepts
    • Overview
    • Combining Multiple Embeddings for Better Retrieval Outcomes
    • Dynamic Parameters/Query Time weights
  • Reference
    • Overview
    • Changelog
    • Components
      • Schema
        • Id Schema Object
        • Event Schema
        • Event Schema Object
        • Schema Object
        • Schema
      • Parser
        • Json Parser
        • Dataframe Parser
        • Data Parser
      • Dag
        • Period Time
      • Storage
        • Vector Database
        • Qdrant Vector Database
        • Mongo Db Vector Database
        • Redis Vector Database
        • In Memory Vector Database
      • Space
        • Custom Space
        • Space Field Set
        • Input Aggregation Mode
        • Text Similarity Space
        • Space
        • Categorical Similarity Space
        • Recency Space
        • Number Space
        • Exception
        • Has Space Field Set
        • Image Space Field Set
        • Image Space
      • Query
        • Query Mixin
        • Query Param Value Setter
        • Query Weighting
        • Space Weight Param Info
        • Clause Params
        • Query Descriptor
        • Query
        • Param Evaluator
        • Param
        • Result
        • Query Filter Information
        • Query Filter Validator
        • Natural Language Query Param Handler
        • Query Filters
        • Query Param Information
        • Nlq Param Evaluator
        • Query Vector Factory
        • Nlq Pydantic Model Builder
        • Typed Param
        • Query Clause
        • Nlq
          • Nlq Compatible Clause Handler
          • Nlq Handler
          • Nlq Clause Collector
          • Exception
          • Suggestion
            • Query Suggestions Prompt Builder
            • Query Suggestion Model
          • Param Filler
            • Query Param Model Validator Info
            • Nlq Annotation
            • Query Param Model Builder
            • Query Param Prompt Builder
            • Query Param Model Validator
            • Templates
        • Query Clause
          • Hard Filter Clause
          • Space Weight Map
          • Looks Like Filter Clause
          • Similar Filter Clause
          • Base Looks Like Filter Clause
          • Single Value Param Query Clause
          • Radius Clause
          • Select Clause
          • Overriden Now Clause
          • Nlq System Prompt Clause
          • Looks Like Filter Clause Weights By Space
          • Limit Clause
          • Weight By Space Clause
          • Nlq Clause
          • Query Clause
        • Predicate
          • Binary Op
          • Binary Predicate
          • Query Predicate
        • Query Result Converter
          • Default Query Result Converter
          • Query Result Converter
          • Serializable Query Result Converter
      • Executor
        • Executor
        • Exception
        • Query
          • Query Executor
        • Rest
          • Rest Handler
          • Rest Configuration
          • Rest Descriptor
          • Rest Executor
        • Interactive
          • Interactive Executor
        • In Memory
          • In Memory Executor
      • App
        • App
        • Online
          • Online App
        • Rest
          • Rest App
        • Interactive
          • Interactive App
        • In Memory
          • In Memory App
      • Source
        • Interactive Source
        • Data Loader Source
        • In Memory Source
        • Source
        • Rest Source
        • Types
      • Index
        • Index
        • Effect
        • Util
          • Event Aggregation Effect Group
          • Aggregation Effect Group
          • Aggregation Node Util
          • Effect With Referenced Schema Object
          • Event Aggregation Node Util
      • Registry
        • Superlinked Registry
        • Exception
  • Recipes
    • Overview
    • Multi-Modal Semantic Search
      • Hotel Search
    • Recommendation System
      • E-Commerce RecSys
  • Tutorials
    • Overview
    • Semantic Search - News
    • Semantic Search - Movies
    • Semantic Search - Product Images & Descriptions
    • RecSys - Ecommerce
    • RAG - HR
    • Analytics - User Acquisition
    • Analytics - Keyword Expansion
  • Help & FAQ
    • Logging
    • Support
    • Discussion
  • Policies
    • Terms of Use
    • Privacy Policy
Powered by GitBook
On this page
  • Intro
  • Turning classes into Schemas
  • Declaring how to embed your data using Spaces
  • Indexing
  • Executing your query to your chosen endpoints
  • Experimenting with some sample data
  • In sum

Was this helpful?

Edit on GitHub
  1. Getting Started

Basic Building Blocks

Learn the Superlinked lingo.

PreviousSetup SuperlinkedNextOverview

Last updated 3 months ago

Was this helpful?

Intro

Superlinked's framework is built on key components: , , , , , and . These building blocks allow you to create a modular system tailored to your specific use cases.

You begin by defining your desired endpoints - how you want your embeddings to represent your data. This guides your system setup, allowing you to customize your modules before running queries. You can adjust query weights for different scenarios, such as a user's interests or recent items.

This modular approach separates query description from execution, enabling you to run the same query across different environments without reimplementation. You build your Query using descriptive elements like @schema, Source, Spaces, Index, or Event, which can be reused with different Executors.

Superlinked's focus on connectors facilitates easy transitions between deployments, from in-memory to batch or real-time data pipelines. This flexibility allows for rapid experimentation in embedding and retrieval while maximizing control over index creation.

Let's explore these building blocks in more detail.

Follow along in this Colab.

Turning classes into Schemas

Once you’ve parsed data into your notebook via JSON or a pandas dataframe, it’s time to create a Schema describing your data.

To do this, you use the Schema decorator to annotate your class as a schema representing your structured data. Schemas translate to searchable entities in the embedding space. To get started, type @schema, and then define the field types to match the different types of data you’ve imported.

class ParagraphSchema(sl.Schema):
    body: sl.String
    id: sl.IdField

With your Schemas created, you are ready to move on to embedding and querying, which is where Superlinked’s building blocks approach really empowers you. The Superlinked framework is based on the intuition that people doing semantic search can better satisfy the requirements of their use case/s if they can customize how their system handles data and queries.

Declaring how to embed your data using Spaces

Spaces is a declarative class developed with this in mind. The Space module encapsulates the vector creation logic that will be used at ingestion time, and again at query time.

Spaces lets you tailor how you embed different attributes of your data and can be categorized along 2 key dimensions:

  1. what input types the Space permits - e.g., text, timestamp, numeric, categorical

  2. whether the Space represents similarity (e.g, TextSimilaritySpace) or scale (e.g., numeric space)

By prioritizing the creation of smarter vectors up front - and only then creating the index - we can achieve better quality retrieval, without costly and time-consuming reranking and general post-processing work.

relevance_space = sl.TextSimilaritySpace(text=paragraph.body, model="Snowflake/snowflake-arctic-embed-s")

Indexing

Superlinked’s Index module components enable you to group Spaces into indices that make your queries more efficient.

paragraph_index = sl.Index(relevance_space)

Executing your query to your chosen endpoints

Before running your code, you need to structure your query using the following arguments:

  • .find: tells it what to look for

  • .select_all: returns all the stored fields, without this clause, it will only return the id(s) (details in [notebook]https://github.com/superlinked/superlinked/blob/main/notebook/feature/query_result.ipynb))

query = (
    sl.Query(paragraph_index)
    .find(paragraph)
    .similar(relevance_space.text, Param("query_text"))
    .select_all()
)

Once you’ve defined your schema and built out the structure of your Index and Query, it’s time to connect everything.

Use Source to connect your data to the schema.

source: sl.InMemorySource = sl.InMemorySource(paragraph)

Now that you’ve connected data with schema, you use the Executor to prepare your code to run. The Executor connects the source data with the index, which describes how each part of the data should be treated within Spaces.

executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

Experimenting with some sample data

Now we can insert some sample data...

source.put([{"id": "happy_dog", "body": "That is a happy dog"}])
source.put([{"id": "happy_person", "body": "That is a very happy person"}])
source.put([{"id": "sunny_day", "body": "Today is a sunny day"}])

...and query it to see what it produces.

result = app.query(query, query_text="This is a happy person")
sl.PandasConverter.to_pandas(result)

Here's our result.

body
id

0

That is a very happy person

happy_person

1

That is a happy dog

happy_dog

2

Today is a sunny day

sunny_day

Changing the query text further demonstrates how our system produces results that are relevant to each query.

result = app.query(query, query_text="This is a happy dog")
sl.PandasConverter.to_pandas(result)
body
id

0

That is a happy dog

happy_dog

1

That is a very happy person

happy_person

2

Today is a sunny day

sunny_day

In sum

In sum, the Superlinked framework empowers you to create and tailor a modular system that fits your use case/s, repurposable for different deployments, saving you from resource- and time-consuming reimplementation, reranking, and postprocessing.

Which Space/s fit your use case depends on both these dimensions - your input type and what you need to represent about your data. You can find a list of spaces in Superlinked .

You use different Spaces for different data types. For example, and can take String as an input, can take Timestamp, can take Int/Float, and so on. Each Space captures a different, relevant piece of information (e.g., title, review count, etc.) about an entity. This lets you weight each Space according to which attributes are relevant to your use case - before you concatenate all your Spaces into a single multimodal vector among others in the queryable vector space.

Query: defines the index you want it to search, and you can add Params here (details in our )

.similar: tells it how to identify relevant results (details in )

Note that you can wait to fill out the specific Params until later. (You can also add a .with_vector to search with an embedded vector of a specific element of your data (see details in )).

Now for the fun part - try it out yourself! The notebook is . Experiment with your own sample data and query inputs, and give us a star!

here
TextSimilaritySpace
CategoricalSimilaritySpace
RecencySpace
NumberSpace
nothebook
notebook
notebook
here
@schema
Source
Spaces
Index
Query
Executor
LogoGoogle Colab