Query time weights & why you should care
Getting quality results from vector database queries isn’t easy. Our experience in machine learning deployment for production use cases has revealed two basic things about representing data:- the richer your source dataset, the better your chances of getting good results, provided your embeddings sufficiently represent your dataset
- different use cases make different parts of your overall dataset more important Any system that achieves efficient, high quality retrieval has to capture the richness of your source dataset, and prioritize the parts of your data that fit your use case.
Query Time Weights
Explore how to set weights at query definition to experiment and optimize without re-embedding your dataset.
Dynamic Parameters
Learn how to use placeholder parameters for fine-tuning weights at query execution time.
Two ways to weight the query - definition
Our system lets you apply weights in two different ways:- setting weights at query definition - lets you experiment and optimize without re-embedding your dataset
- setting weights when running the query - lets you (data scientist or user) fine-tune even after query definition
Weighting when you define the query
Superlinked’s Spaces are structured for embedding different attributes of your data separately, permitting you to weight each attribute individually - before concatenating them into a single vector - when you define your queries. This enables you to run experiments, tuning the weights of different vector parts without having to re-embed your dataset. Let’s walk through how you set this up in Superlinked, using an example where you define two queries - one that optimizes on paragraph similarity, and another that optimizes on like count. After installing superlinked, you import the requisite modules: library, schema-related classes, index class, text_similarity and number spaces, query constructor, and display config (see cell 2). You then define your schema class and two spaces, and build an index on top of your spaces:body_query
that weights text similarity twice as much as likes, and another like_query
that weights likes twice as much as text similarity.
body | like_count | id | |
---|---|---|---|
0 | Growing computation power enables advancements in AI. | 10 | paragraph-2 |
1 | Glorious animals live in the wilderness. | 75 | paragraph-1 |
body | like_count | id | |
---|---|---|---|
0 | Glorious animals live in the wilderness. | 75 | paragraph-1 |
1 | Growing computation power enables advancements in AI. | 10 | paragraph-2 |
Weighting when you run the query - dynamic Params
In production systems, the developer generally defines the queries. The Superlinked approach gives you - the data scientist - the freedom to experiment with and fine-tune / optimize weights at query time - after the developer has defined the query. Or it gives the user more power to specify what’s more relevant to them. You set this up by putting placeholder Params in the query definitions - Params that you can fill in dynamically, weighting one or another parameter, when you run your query. Using our example setup and data above, let’s look at how you can set weights when running the query. As above, you import the requisite modules, but also import the Param class - which you’ll use to define dynamic parameters in queries (see cell 2 in the notebook). You then proceed with the set up as above (see cells 3-5). Now that your have your system set up, you can define your queries using dynamic Param placeholders, so that they can be filled in later.In sum
Superlinked Spaces enable two different kinds of query time weighting, 1) weighting when defining the query, and 2) weighting when executing the query, each with its own associated benefits, and no need to rerank, build custom layers, or re-embed.- Because Superlinked permits you to assign weights when defining your queries, you can experiment and optimize without having to re-embed your dataset.
- Assigning weights using dynamic parameters when you run the query offers the data scientist / user additional optimization control over what counts as relevant, even after query definition.