Integration: ArangoDB
Use the ArangoDB database as a Document Store with Haystack
Table of Contents
Overview
ArangoDB is an open-source, multi-model database that combines documents, graphs, and key/values with native vector search. This integration lets you use ArangoDB as a Document Store in Haystack and retrieve documents with vector similarity search, which makes it a good fit for RAG and GraphRAG pipelines.
The integration provides two components:
ArangoDocumentStore: a Document Store that stores documents (including their embeddings) in an ArangoDB collection and implements the DocumentStore protocol.ArangoEmbeddingRetriever: a retriever that fetches the most relevant documents from anArangoDocumentStoreusing vector similarity on embeddings.
Installation
Vector search requires ArangoDB 3.12 or later with the vector index enabled. You can quickly start a local instance with Docker:
docker run -e ARANGO_ROOT_PASSWORD=test-password -p 8529:8529 arangodb:3.12 arangod --vector-index
Install the integration with pip:
pip install arangodb-haystack
Usage
By default, the ArangoDocumentStore reads its credentials from the ARANGO_USERNAME (optional, falls back to the root user) and ARANGO_PASSWORD environment variables:
export ARANGO_PASSWORD="test-password"
Then initialize the Document Store:
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
document_store = ArangoDocumentStore(
host="http://localhost:8529",
database="haystack",
collection_name="haystack_documents",
embedding_dimension=768,
similarity_function="cosine",
recreate_collection=True,
)
Writing Documents to ArangoDocumentStore
To write documents to the ArangoDocumentStore, create an indexing pipeline that embeds and writes documents:
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": file_paths}})
Retrieval from ArangoDocumentStore
You can retrieve documents that are semantically similar to a query with a pipeline that uses the ArangoEmbeddingRetriever:
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.arangodb import ArangoEmbeddingRetriever
querying = Pipeline()
querying.add_component("embedder", SentenceTransformersTextEmbedder())
querying.add_component("retriever", ArangoEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("embedder", "retriever")
results = querying.run({"embedder": {"text": "my query"}})
The retriever also supports metadata filtering, which you can pass either at initialization or at query time.
License
arangodb-haystack is distributed under the terms of the
Apache-2.0 license.
