Important Note: The RecencySpace
feature is turned off by default due to the constraints of this release. For a detailed explanation and instructions on enabling it, refer to the Using Recency Space section of the documentation.
Note: The primary aim of this document is to guide you on how to operate the Superlinked system with your preferred configuration, rather than explaining the inner workings of the Superlinked components. For a deeper understanding of the components, please refer to the notebooks mentioned above.
1. Understanding the building blocks of the application
A functional application is structured around three core components:index.py
- Defines schemas, spaces, and indexesquery.py
- Specifies the queriesapi.py
- Configures the executor that integrates the aforementioned components and other crucial configurations
It is crucial to understand that all definitions in this file determine the vectors of your elements. Any modifications to this file, such as adding a new space or altering the schema, will render the previously ingested data invalid, necessitating re-ingestion.
2. Configuring the data loader
The system has a feature to load data from file(s) either from local or remote.
Note: In the absence of specified chunking, the loader will attempt to read and load the entire file into the system by default. Mind your memory! If possible, utilize file formats that support chunking and include the necessary parameters in the pandas_read_kwargs
as indicated below.
Constraints:
- When running your preview locally, only local files or public ones from remote sources can be used. Targeting an S3 bucket or GCP that requires authentication is not possible.
- When running in the cloud, for example on GCP, you can target private Google Cloud Storage (GCS) bucket but only those that the Google Cloud Engine (GCE) instance has access to. It will utilize its own authentication and authorization, but no other private cloud sources like S3 can be used. Local files on the GCE or any public file that doesn’t require authorization can also be used.
Incorporate Data Source
Create a specific source that can point to a local or a remote file. This file can be parsed and loaded into the system more efficiently than invoking the REST endpoint for each piece of data:Name of your data loader: TheThe data loader is now configured but it only runs if you send a request to the data loader endpoint! To see how to trigger it, check the API documentation herename
parameter inDataLoaderConfig
is optional. By default, it adopts the snake_case version of your schema’s name used inDataLoaderSource
. If you have multiple data loaders for the same schema or prefer a different name, simply set thename
parameter accordingly. Note that the name will always be converted to snake_case. To see the configured data loaders in your system, refer to the API documentation.
3. Optional steps
Schema to column mappings
By default, the system will attempt to parse your file, hence the column names should align with your schema attributes. If anid
column has a different name for example, as well as the other schema fields, it needs to be mapped to the schema you are attempting to load. To map field names to your schema, utilize the data parser as shown below:
Data Chunking
Data chunking allows you to load more data than your memory could typically handle at once. This is particularly beneficial when dealing with data sets that span multiple gigabytes.To prevent out-of-memory issues, it’s recommended to use chunking when dealing with large datasets. Set theTo implement chunking, you’ll need to use either CSV or JSON formats (specifically JSONL, which includes JSON objects on each line). Here’s an example of what a chunking configuration might look like:LOG_LEVEL
environment variable toDEBUG
to monitor pandas memory usage metrics, which can help you determine optimal chunk sizes and estimate total memory requirements. These metrics are available regardless of whether chunking is enabled.
ONLINE_PUT_CHUNK_SIZE
environment variable to the desired number.
Customize your API
If you want to configure your API path, you can do that with theRestEndpointConfiguration
, which can alter your URL. By default the API looks like:
- Query endpoint’s path is:
/api/v1/search/<query_name>
which aligns with the schema:/<api_root_path>/<query_path_prefix>/<query_name>
- Data ingestion endpoint’s path is:
/api/v1/ingest/<schema_name>
which aligns with the schema:/<api_root_path>/<ingest_path_prefix>/<schema_name>
- The rest of the API is non configurable, that is part of the so called, management API.
Using Recency Space
Recency Space has two current limitations:- Recency embeddings become outdated over time as they are not recalculated periodically. Our encoder only needs a constant number of updates for this to work correctly, but that update mechanism has not been open-sourced yet - coming soon!
- At server startup, the application captures the server’s current UTC timestamp as
now
. Each modification and restart of the application will result in a new timestamp, which does not dynamically update during runtime.
EXECUTOR_DATA
to your executor, like:
DISABLE_RECENCY_SPACE
environment variable to false
GPU acceleration
If your system’s host machine is equipped with a GPU, this documentation provides guidance on leveraging it for computational tasks. GPU acceleration is currently supported for models that handle text or image embeddings and requires explicit activation. It is particularly effective when processing large batches of data, especially within the context of the data loading feature.Ensure that your system has a GPU compatible with PyTorch and that the GPU drivers are up to date. For optimal performance, we recommend using NVIDIA GPUs as they provide the best support for deep learning frameworks like PyTorch.To enable GPU acceleration in Superlinked, configure the
GPU_EMBEDDING_THRESHOLD
environment variable. This variable determines when GPU embedding is activated based on batch size:
- 0 (Default) : GPU embedding is disabled. All embeddings will be processed using the CPU.
- 1 : Forces GPU embedding, regardless of batch size.
- 2 to 99999 : Uses CPU for embedding if the batch size is below the specified value; otherwise, GPU is used. This allows for faster processing of small batches, where the CPU may be more efficient.
Environment variables
The Superlinked Server accepts the following environment variables (see this recipe for inspiration on how to set these):Variable | Type | Explanation | Default Value |
---|---|---|---|
APP_MODULE_PATH | str | Path to server code files. | ”superlinked_app” |
DISABLE_RECENCY_SPACE | Bool | Server will explicitly reject RecencySpace in an index (see above for more RecencySpace information). | True |
EXPOSE_PII | bool | - | False |
JSON_LOG_FILE | str | Filename for the JSON log file produced by the server. | None |
LOG_AS_JSON | str | Produce a log in JSON format (avoids query truncation). | False |
LOG_LEVEL | str | https://docs.python.org/3/library/logging.html#logging-levels | ”INFO” |
PERSISTENCE_FOLDER_PATH | str | - | ”in_memory_vdb” |
SERVER_HOST | str | IP address of the server. | ”0.0.0.0” |
SERVER_PORT | int | Port of the server. | 8080 |
WORKER_COUNT | int | Number of workers | 1 |