Mongo DB

This document provides clear steps on how to use and integrate MongoDB with Superlinked.

Configuring your existing managed MongoDB

To integrate MongoDB with Superlinked, ensure you are using a version that supports Atlas Vector Search capabilities. Refer to the MongoDB documentation for more information.

Superlinked requires access to MongoDB to list, create, and delete Atlas Search Indexes. As of writing, MongoDB separates functionality by database instance sizes. If you use anything below M10, the database does not support creating, listing, and deleting the Atlas Search Index via a standard user, only via the administration API. You can read more about the limitation and also about the administration API. To support all types, Superlinked uses the aforementioned API to manage the indexes.

Due to the reasons above, an API key with the Project Data Access Admin role is required. More about how to create that can be found below.

Note: When using that API, you will need project_id and cluster_name, which are also described below on how to find this information.

Modifications in your configuration

To integrate MongoDB, you need to add the MongoDBVectorDatabase class and include it in the executor. Here’s how you can do it:

from superlinked.framework.dsl.storage.mongo_db_vector_database import MongoDBVectorDatabase

vector_database = MongoDBVectorDatabase(
    host="<USER>:<PASSWORD>@<HOST_URL>", # The DB's host URL with the username and password
    db_name="<DATABASE_NAME>", # Name of your database inside your cluster. You need to create it, the system won't do it automatically
    cluster_name="<CLUSTER_NAME>", # Name of your cluster inside your project
    project_id="<PROJECT_ID>", # The ID (not the name) of your project. To see how to find it, read the note below this box
    admin_api_user="<API_USER>", # The generated API key's user, which called public key on Mongo Atlas
    admin_api_password="<API_PASSWORD>", # The API password, which generated by mongo, they reference it on Atlas as private key
    default_query_limit=10, # This optional parameter specifies the maximum number of query results returned. If not set, it defaults to 10.
    # Anything else is handled as kwargs so those will be passed in to the MongoClient. Read more about the possible parameters below
)

Project ID: To find your Project ID, select you organization in the top left corner of Atlas UI. Afterward, find your project (don't click on it). In the last column ("Actions") expend the menu by clicking on the ellipses (...), then copy select "Copy Project ID" which will paste it to your clipboard.

Alternative: Click on your project on Atlas and in the URL you will find the id: https://cloud.mongodb.com/v2/12755aca606daa697d3e30b9#/overview where the 12755aca606daa697d3e30b9 before the # and after the https://cloud.mongodb.com/v2/ is your project ID. The organization ID is very similar to this number, but please make sure that you copy the ID after you selected the project!

Extra parameters: Extra params can be passed in to the PyMongo client called MongoClient. Please read the documentation for more information.

Once you have configured the MongoDBVectorDatabase, set it as your vector_database in the RestExecutor:

...
executor = RestExecutor(
    sources=[source],
    indices=[index],
    queries=[RestQuery(RestDescriptor("query"), query)],
    vector_database=vector_database, # Or any variable that you assigned your `MongoDBVectorDatabase`
)
...

Start a managed Mongo DB instance

A step-by-step guide to set up a database, a user, and the required API key.

Creating the Database:

  1. Navigate to Mongo Atlas and sign in.

  2. Create your cluster. The cluster name will be needed for the configuration mentioned above. You can choose any other options as they do not impact Superlinked's functionality.

  3. Click on the Database option in the left menu column.

  4. Once the cluster is created, click on its name and then go to the collections tab or click on the Browse Collections button.

  5. Click on Add My Own Data and provide a name for your database and collection. The database name will be required for the configuration above. The collection name is not critical and can be deleted later as Superlinked will create its own.

Creating a Database User and add your IP:

  1. Click on the Database option on the left.

  2. Click the Connect button next to your cluster's name.

  3. In the pop-up window:

    1. Click on the Allow access from Anywhere or select the Add a different IP address and insert your VM's or local IP address.

    2. Enter the username and password for your user. These credentials will be needed for the configuration above.

Creating the API Key:

  1. Click on the Access Manager selector at the top left corner next to your organization selector and select your project.

  2. Go to the API Keys tab.

  3. Provide a name for the API key and select the Project Data Access Admin role in the Project Permissions selector.

  4. Copy the Private Key as it will not be accessible again. The Public key and Private key will be your admin_api_user and admin_api_password in your connection in this order.

Example app with Mongo DB

You can find an example that utilizes Mongo DB here

Last updated