โญ Haystack hit 25k GitHub stars! Thanks for helping us reach this milestone
Maintained by deepset

Integration: AlloyDB

A Document Store for storing and retrieval from Google Cloud AlloyDB with pgvector

Authors
deepset
Gary Badwal

PyPI - Version PyPI - Python Version


Table of Contents

Overview

AlloyDB is a fully managed, PostgreSQL-compatible database service on Google Cloud, optimised for demanding transactional and analytical workloads.

This integration provides a Haystack DocumentStore backed by AlloyDB with the pgvector extension, enabling both dense vector similarity search and full-text keyword search.

Connections are established through the AlloyDB Python Connector, which handles IAM-based authentication and TLS encryption without requiring manual firewall rules or IP allowlisting.

Installation

pip install alloydb-haystack

Usage

Set the following environment variables to point at your AlloyDB instance:

Variable Description
ALLOYDB_INSTANCE_URI AlloyDB instance URI: projects/P/locations/R/clusters/C/instances/I
ALLOYDB_USER Database user (or IAM principal for IAM auth)
ALLOYDB_PASSWORD Database password (not required when enable_iam_auth=True)

Once installed, initialize AlloyDBDocumentStore:

from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore

document_store = AlloyDBDocumentStore(
    db="my-database",
    embedding_dimension=768,
    recreate_table=True,
)

Authentication

The integration supports both password-based authentication (default) and IAM-based authentication via a Google Cloud service account.

Password Authentication

from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore

# Reads ALLOYDB_INSTANCE_URI, ALLOYDB_USER, and ALLOYDB_PASSWORD from the environment
document_store = AlloyDBDocumentStore(
    db="my-database",
    embedding_dimension=768,
)

IAM Authentication

When using a service account for database access, set enable_iam_auth=True:

from haystack.utils import Secret
from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore

document_store = AlloyDBDocumentStore(
    db="my-database",
    user=Secret.from_env_var("ALLOYDB_IAM_USER"),  # e.g. "my-sa@my-project.iam"
    enable_iam_auth=True,
    embedding_dimension=768,
)

You can also choose the IP type used by the connector (PRIVATE, PUBLIC, or PSC) depending on your network configuration.

Writing Documents to AlloyDBDocumentStore

To write documents to AlloyDBDocumentStore, create an indexing pipeline.

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": file_paths}})

You can retrieve semantically similar documents to a given query using a pipeline that includes the AlloyDBEmbeddingRetriever.

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.alloydb import AlloyDBEmbeddingRetriever

querying = Pipeline()
querying.add_component("embedder", SentenceTransformersTextEmbedder())
querying.add_component("retriever", AlloyDBEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("embedder", "retriever")

results = querying.run({"embedder": {"text": "my query"}})

You can also retrieve documents based on full-text keyword matching with the AlloyDBKeywordRetriever, which uses PostgreSQL’s tsvector/tsquery.

from haystack_integrations.components.retrievers.alloydb import AlloyDBKeywordRetriever

retriever = AlloyDBKeywordRetriever(document_store=document_store, top_k=3)
results = retriever.run(query="capital France")

HNSW Index

For large datasets, the HNSW index provides approximate nearest-neighbour search with significantly better query throughput:

document_store = AlloyDBDocumentStore(
    db="my-database",
    embedding_dimension=768,
    search_strategy="hnsw",
    hnsw_index_creation_kwargs={"m": 16, "ef_construction": 64},
    hnsw_ef_search=40,
)

Examples

You can find code examples showing how to use the Document Store and the Retrievers under the examples/ folder of this repo.

License

alloydb-haystack is distributed under the terms of the Apache-2.0 license.