Basic Building Blocks
Learn the Superlinked lingo.
Last updated
Was this helpful?
Learn the Superlinked lingo.
Last updated
Was this helpful?
Superlinked's framework is built on key components: , , , , , and . These building blocks allow you to create a modular system tailored to your specific use cases.
You begin by defining your desired endpoints - how you want your embeddings to represent your data. This guides your system setup, allowing you to customize your modules before running queries. You can adjust query weights for different scenarios, such as a user's interests or recent items.
This modular approach separates query description from execution, enabling you to run the same query across different environments without reimplementation. You build your Query using descriptive elements like @schema, Source, Spaces, Index, or Event, which can be reused with different Executors.
Superlinked's focus on connectors facilitates easy transitions between deployments, from in-memory to batch or real-time data pipelines. This flexibility allows for rapid experimentation in embedding and retrieval while maximizing control over index creation.
Let's explore these building blocks in more detail.
Follow along in this Colab.
Once youâve parsed data into your notebook via JSON or a pandas dataframe, itâs time to create a Schema describing your data.
To do this, you use the Schema decorator to annotate your class as a schema representing your structured data. Schemas translate to searchable entities in the embedding space. To get started, type @schema, and then define the field types to match the different types of data youâve imported.
With your Schemas created, you are ready to move on to embedding and querying, which is where Superlinkedâs building blocks approach really empowers you. The Superlinked framework is based on the intuition that people doing semantic search can better satisfy the requirements of their use case/s if they can customize how their system handles data and queries.
Spaces is a declarative class developed with this in mind. The Space module encapsulates the vector creation logic that will be used at ingestion time, and again at query time.
Spaces lets you tailor how you embed different attributes of your data and can be categorized along 2 key dimensions:
what input types the Space permits - e.g., text, timestamp, numeric, categorical
whether the Space represents similarity (e.g, TextSimilaritySpace) or scale (e.g., numeric space)
By prioritizing the creation of smarter vectors up front - and only then creating the index - we can achieve better quality retrieval, without costly and time-consuming reranking and general post-processing work.
Superlinkedâs Index module components enable you to group Spaces into indices that make your queries more efficient.
Before running your code, you need to structure your query using the following arguments:
.find
: tells it what to look for
.select_all
: returns all the stored fields, without this clause, it will only return the id(s) (details in [notebook]https://github.com/superlinked/superlinked/blob/main/notebook/feature/query_result.ipynb))
Once youâve defined your schema and built out the structure of your Index and Query, itâs time to connect everything.
Use Source to connect your data to the schema.
Now that youâve connected data with schema, you use the Executor to prepare your code to run. The Executor connects the source data with the index, which describes how each part of the data should be treated within Spaces.
Now we can insert some sample data...
...and query it to see what it produces.
Here's our result.
0
That is a very happy person
happy_person
1
That is a happy dog
happy_dog
2
Today is a sunny day
sunny_day
Changing the query text further demonstrates how our system produces results that are relevant to each query.
0
That is a happy dog
happy_dog
1
That is a very happy person
happy_person
2
Today is a sunny day
sunny_day
In sum, the Superlinked framework empowers you to create and tailor a modular system that fits your use case/s, repurposable for different deployments, saving you from resource- and time-consuming reimplementation, reranking, and postprocessing.
Which Space/s fit your use case depends on both these dimensions - your input type and what you need to represent about your data. You can find a list of spaces in Superlinked .
You use different Spaces for different data types. For example, and can take String as an input, can take Timestamp, can take Int/Float, and so on. Each Space captures a different, relevant piece of information (e.g., title, review count, etc.) about an entity. This lets you weight each Space according to which attributes are relevant to your use case - before you concatenate all your Spaces into a single multimodal vector among others in the queryable vector space.
Query
: defines the index you want it to search, and you can add Params here (details in our )
.similar
: tells it how to identify relevant results (details in )
Note that you can wait to fill out the specific Params until later. (You can also add a .with_vector
to search with an embedded vector of a specific element of your data (see details in )).
Now for the fun part - try it out yourself! The notebook is . Experiment with your own sample data and query inputs, and give us a star!