The VectorDatabase class serves as the abstract foundation for all vector database implementations in Superlinked. It defines the interface that concrete vector database implementations must follow to provide persistent storage and retrieval of vector embeddings.

Constructor

VectorDatabase()
This is an abstract base class and cannot be instantiated directly. Use concrete implementations for specific vector database providers.

Architecture

The VectorDatabase class ensures that any concrete implementation provides a connector to the underlying vector database through the _vdb_connector property.

Attributes

_vdb_connector
VDBConnectorT
required
An abstract property that concrete implementations must override to return an instance of a VDBConnector for the specific database type.

Inheritance Hierarchy

The VectorDatabase class serves as the base for all vector database implementations: Inheritance Chain: VectorDatabaseABC + Generic

Available Implementations

Vector Database Features

Vector databases provide several key capabilities:

Persistent Storage

  • Store vector embeddings with associated metadata
  • Maintain data durability across application restarts
  • Scale storage capacity based on data volume
  • Efficient nearest neighbor search algorithms
  • Support for various distance metrics (cosine, euclidean, etc.)
  • Optimized indexing for fast retrieval

Filtering and Querying

  • Combine vector similarity with metadata filtering
  • Support complex query conditions
  • Handle large-scale concurrent queries

Performance Optimization

  • Index management for search performance
  • Memory and disk optimization strategies
  • Clustering and sharding capabilities

Implementation Requirements

Concrete vector database implementations must provide:
  1. Connection Management: Establish and maintain connections to the database
  2. Vector Operations: Store, update, and retrieve vector embeddings
  3. Search Functionality: Perform similarity searches with filtering
  4. Index Management: Create and maintain search indices
  5. Error Handling: Graceful handling of database errors and timeouts

Database Selection Guide

Production Workloads

Qdrant: Excellent for high-performance vector search with advanced filtering capabilities. Supports both cloud and self-hosted deployments.
Redis: Great for applications already using Redis, providing fast in-memory vector search with persistence options.
MongoDB: Ideal for applications with existing MongoDB infrastructure, offering integrated document and vector search.

Development & Testing

InMemoryVectorDatabase: Perfect for development, testing, and prototyping. No external dependencies required.
TopKVectorDatabase: Useful for scenarios where only the top-K most similar results are needed, providing memory optimization.

Best Practices

Database Configuration

Connection Limits: Configure appropriate connection limits and timeouts based on your expected query volume and latency requirements.

Performance Tuning

Index Strategy: Choose appropriate indexing strategies based on your vector dimensions, data size, and query patterns. Each database provides different indexing algorithms optimized for specific use cases.

Data Management

Backup Strategy: Implement regular backup procedures for production vector databases to prevent data loss and enable disaster recovery.

Integration Pattern

Vector databases integrate into the Superlinked pipeline as storage backends:
  1. Vector Generation: Spaces transform data into vectors
  2. Index Organization: Indices organize vectors for efficient querying
  3. Storage: VectorDatabase implementations persist vectors
  4. Retrieval: Query operations search stored vectors
  5. Results: Matching vectors are returned with metadata
The abstract nature of this class ensures consistent behavior across different vector database providers while allowing each implementation to optimize for its specific strengths and capabilities.