๐Ÿ†• Haystack 2.30 is here! Pass a plain string to any ChatGenerator
Maintained by deepset

Integration: ArangoDB

Use the ArangoDB database as a Document Store with Haystack

Authors
deepset

PyPI - Version PyPI - Python Version test


Table of Contents

Overview

ArangoDB is an open-source, multi-model database that combines documents, graphs, and key/values with native vector search. This integration lets you use ArangoDB as a Document Store in Haystack and retrieve documents with vector similarity search, which makes it a good fit for RAG and GraphRAG pipelines.

The integration provides two components:

  • ArangoDocumentStore: a Document Store that stores documents (including their embeddings) in an ArangoDB collection and implements the DocumentStore protocol.
  • ArangoEmbeddingRetriever: a retriever that fetches the most relevant documents from an ArangoDocumentStore using vector similarity on embeddings.

Installation

Vector search requires ArangoDB 3.12 or later with the vector index enabled. You can quickly start a local instance with Docker:

docker run -e ARANGO_ROOT_PASSWORD=test-password -p 8529:8529 arangodb:3.12 arangod --vector-index

Install the integration with pip:

pip install arangodb-haystack

Usage

By default, the ArangoDocumentStore reads its credentials from the ARANGO_USERNAME (optional, falls back to the root user) and ARANGO_PASSWORD environment variables:

export ARANGO_PASSWORD="test-password"

Then initialize the Document Store:

from haystack_integrations.document_stores.arangodb import ArangoDocumentStore

document_store = ArangoDocumentStore(
    host="http://localhost:8529",
    database="haystack",
    collection_name="haystack_documents",
    embedding_dimension=768,
    similarity_function="cosine",
    recreate_collection=True,
)

Writing Documents to ArangoDocumentStore

To write documents to the ArangoDocumentStore, create an indexing pipeline that embeds and writes documents:

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")

indexing.run({"converter": {"sources": file_paths}})

Retrieval from ArangoDocumentStore

You can retrieve documents that are semantically similar to a query with a pipeline that uses the ArangoEmbeddingRetriever:

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.arangodb import ArangoEmbeddingRetriever

querying = Pipeline()
querying.add_component("embedder", SentenceTransformersTextEmbedder())
querying.add_component("retriever", ArangoEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("embedder", "retriever")

results = querying.run({"embedder": {"text": "my query"}})

The retriever also supports metadata filtering, which you can pass either at initialization or at query time.

License

arangodb-haystack is distributed under the terms of the Apache-2.0 license.