Integration: ArcadeDB
Use ArcadeDB as a document store with native HNSW vector search for Haystack
Table of Contents
Overview
An integration of ArcadeDB with Haystack by ArcadeData.
Most RAG setups need separate backends for documents, vectors, and metadata search. ArcadeDB replaces all three in a single multi-model database:
- Document storage โ vertex-based records with flexible MAP metadata
- HNSW vector search โ native approximate nearest neighbor index via
vectorNeighbors()(cosine, euclidean, dot product) - SQL filtering โ full SQL WHERE clauses on metadata fields
- No special drivers โ pure HTTP/JSON API, no binary protocol or custom driver required
The library provides an ArcadeDBDocumentStore that implements the Haystack
DocumentStore protocol, plus pipeline-ready retriever components:
- ArcadeDBDocumentStore โ stores Documents as ArcadeDB vertices with embeddings indexed by a dedicated HNSW Vector Index for dense retrieval.
- ArcadeDBEmbeddingRetriever โ a retriever component that queries the vector index to find related Documents, with support for metadata filtering and runtime parameter overrides.
+-----------------------------+
| ArcadeDB Database |
+-----------------------------+
| |
| +----------------+ |
| | Document | |
write_documents | +----------------+ |
+------------------------+----->| properties | |
| | | | |
+---------+----------+ | | embedding | |
| | | +--------+-------+ |
| ArcadeDBDocument | | | |
| Store | | |index/query |
+---------+----------+ | | |
| | +---------+---------+ |
| | | HNSW Vector Index | |
+----------------------->| | | |
_embedding_retrieval | | (for embedding) | |
| +-------------------+ |
| |
+-----------------------------+
In the above diagram:
Documentis an ArcadeDB vertex typepropertiesare Document attributes stored as vertex propertiesembeddingis a vector property of typeLIST[FLOAT], indexed by ArcadeDB’s native HNSW indexHNSW Vector Indexprovides approximate nearest neighbor search viavectorNeighbors()
Installation
arcadedb-haystack can be installed using pip:
pip install arcadedb-haystack
Usage
Once installed, you can start using ArcadeDBDocumentStore as any other document store that supports embeddings.
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=384,
similarity_function="cosine",
)
You will need a running ArcadeDB instance. The simplest way is with Docker:
docker run -d -p 2480:2480 \
-e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb" \
arcadedata/arcadedb:latest
Set credentials via environment variables:
export ARCADEDB_USERNAME=root
export ARCADEDB_PASSWORD=arcadedb
Writing documents
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy
documents = [
Document(
content="ArcadeDB supports graphs, documents, and vectors.",
meta={"source": "docs", "category": "database"},
)
]
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
Retrieving documents
ArcadeDBEmbeddingRetriever can be used in a pipeline to retrieve documents by querying the HNSW vector index with an embedded query, including metadata filtering:
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=384,
)
# Index documents with embeddings
documents = [
Document(content="My name is Morgan and I live in Paris.", meta={"release_date": "2018-12-09"})
]
document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"))
# Build retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
pipeline.add_component("retriever", ArcadeDBEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = pipeline.run(
data={
"text_embedder": {"text": "What cities do people live in?"},
"retriever": {
"top_k": 5,
"filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
},
}
)
documents = result["retriever"]["documents"]
More examples
You can find more examples in the repository:
- embedding_retrieval.py โ Full workflow demonstrating document indexing and vector similarity retrieval with ArcadeDB.
License
arcadedb-haystack is distributed under the terms of the
Apache 2.0 license.
