Maintained by deepset

Integration: pgvector

A Document Store for storing and retrieval from pgvector


PyPI - Version PyPI - Python Version test

Table of Contents


pgvector is an extension for PostgreSQL that adds support for vector similarity search.

To quickly set up a PostgreSQL database with pgvector, you can use Docker:

docker run -d -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=postgres ankane/pgvector

For more information on how to install pgvector, visit the pgvector GitHub repository.

Use pip to install pgvector-haystack:

pip install pgvector-haystack


Define the connection string to your PostgreSQL database in the PG_CONN_STR environment variable. For example:

export PG_CONN_STR="postgresql://postgres:postgres@localhost:5432/postgres"

Once installed, initialize PgvectorDocumentStore:

from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore

document_store = PgvectorDocumentStore(

Writing Documents to PgvectorDocumentStore

To write documents to PgvectorDocumentStore, create an indexing pipeline.

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer"){"converter": {"sources": file_paths}})

Retrieval from PgvectorDocumentStore

You can retrieve semantically similar documents to a given query using a simple pipeline that includes the PgvectorEmbeddingRetriever.

from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.pgvector import PgvectorEmbeddingRetriever
from haystack import Pipeline

querying = Pipeline()
querying.add_component("embedder", SentenceTransformersTextEmbedder())
querying.add_component("retriever", PgvectorEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("embedder", "retriever")

results ={"embedder": {"text": "my query"}})

You can also retrieve Documents based on keyword matching with the PgvectorKeywordRetriever.

from haystack_integrations.components.retrievers.pgvector import PgvectorKeywordRetriever

retriever = PgvectorKeywordRetriever(document_store=document_store, top_k=3))
results ="my query")


You can find a code example showing how to use the Document Store and the Retriever under the examples/ folder of this repo.


pgvector-haystack is distributed under the terms of the Apache-2.0 license.