PostgreSQL Vector Store

The PostgreSQL Vector Store node connects your pipeline to a PostgreSQL database, leveraging the pgvector extension for powerful vector search capabilities.

Key capabilities

Stores high-dimensional vector embeddings within a standard PostgreSQL database.
Utilizes the pgvector extension to perform efficient similarity and distance searches (ANN).
Supports standard PostgreSQL tables, allowing vector data to be stored alongside structured metadata.
Enables durable and scalable vector storage by leveraging PostgreSQL’s robust, transaction-safe architecture.

Configuration

Basic Configuration

Host: The hostname or IP address of the PostgreSQL server.
- Example – your-postgres-host.example.com
Port: The port number for the PostgreSQL server connection.
- Example – 5432
User: The username for database authentication.
- Example – postgres
Database: The name of the database to connect to.
- Example – aparavi
Table: The name of the database table where vectors will be stored.
- Example – aparavi
Retrieval Score: The minimum similarity score (from 0.0 to 1.0) for a result to be considered relevant.
- Example – 0.5

Model Selection

Similarity Metric: Choose the algorithm for comparing vector similarity. cosine is recommended for embeddings generated by transformer models. l2 (Euclidean) and inner_product are also available for other use cases.

API Key

Password: Enter the password associated with the specified User for database authentication. This is not an API key but is required for access.

Inputs and Outputs

Input Channels

documents: Receives vectorized documents to be stored. The input format is JSON objects containing the vector embedding and any associated metadata. While PostgreSQL has high limits, for best performance, input from the embedding model should not exceed 8192 tokens.
questions: Receives vectorized queries for searching similar documents. The input format is a JSON object containing the query vector.

Output Channels

documents: Emits the full document information retrieved from PostgreSQL for the best-matching results. The format is a stream of JSON objects.
answers: Returns a condensed set of results containing vectors and metadata based on the retrieval score. The format is a stream of JSON objects.
questions: Forwards the original incoming query vector downstream for logging or further processing. The format is the original JSON object.

Supported Model Variants

Model Variant	Description	Max Tokens	Optimized for
cosine	Cosine distance. Measures the cosine of the angle between two vectors.	N/A	Semantic similarity with transformer-based embeddings.
l2	L2 distance (Euclidean distance). Measures the straight-line distance between two vectors.	N/A	Image or feature similarity where magnitude matters.
inner_product	Negative inner product. Can be faster but requires normalized vectors for accurate results.	N/A	Maximizing performance with normalized embeddings.

Data Flow Process

Request Handling: Input documents or questions arriving on their respective channels are received as JSON objects. The connector transforms these objects into SQL statements compatible with the pgvector extension. For new documents, this is an INSERT statement. For queries, this is a SELECT statement with a vector similarity search clause (e.g., ORDER BY embedding query_vector).
Response Handling: The results of the SQL query are retrieved from the database. The connector formats these database rows back into JSON objects and emits them on the appropriate output channel (documents or answers).
Connection Management: The connector manages a connection pool to the PostgreSQL database to handle requests efficiently and concurrently.

Common Use Cases

Semantic Search: Find documents conceptually similar to a user’s query rather than just matching keywords. Wire the pipeline as: Source Connector -> Embedding Model -> PostgreSQL Vector Store.
Retrieval-Augmented Generation (RAG): Provide relevant context to a Large Language Model (LLM) to improve answer quality and reduce hallucinations. Wire the pipeline as: Source Connector -> Embedding Model -> PostgreSQL Vector Store -> LLM.

Key capabilities

Configuration

Basic Configuration

Model Selection

API Key

Inputs and Outputs

Input Channels

Output Channels

Supported Model Variants

Data Flow Process

Common Use Cases

Additional Resources