Integration: ArcadeDB
Use ArcadeDB as a document store with native HNSW vector search for Haystack
Table of Contents
Overview
An integration of ArcadeDB with Haystack by ArcadeData.
ArcadeDB is a multi-model database that combines document storage, HNSW vector search, and SQL-based metadata filtering:
- Document storage โ vertex-based records with flexible MAP metadata
- HNSW vector search โ native approximate nearest neighbor index via
vectorNeighbors()(cosine, euclidean, dot product) - SQL filtering โ full SQL WHERE clauses on metadata fields
- No special drivers โ pure HTTP/JSON API, no binary protocol or custom driver required
The library provides an ArcadeDBDocumentStore that implements the Haystack
DocumentStore protocol, plus pipeline-ready retriever components:
- ArcadeDBDocumentStore โ stores Documents as ArcadeDB vertices with embeddings indexed by a dedicated HNSW Vector Index for dense retrieval.
- ArcadeDBEmbeddingRetriever โ a retriever component that queries the vector index to find related Documents, with support for metadata filtering and runtime parameter overrides.
+-----------------------------+
| ArcadeDB Database |
+-----------------------------+
| |
| +----------------+ |
| | Document | |
write_documents | +----------------+ |
+------------------------+----->| properties | |
| | | | |
+---------+----------+ | | embedding | |
| | | +--------+-------+ |
| ArcadeDBDocument | | | |
| Store | | |index/query |
+---------+----------+ | | |
| | +---------+---------+ |
| | | HNSW Vector Index | |
+----------------------->| | | |
_embedding_retrieval | | (for embedding) | |
| +-------------------+ |
| |
+-----------------------------+
In the above diagram:
Documentis an ArcadeDB vertex typepropertiesare Document attributes stored as vertex propertiesembeddingis a vector property of typeLIST[FLOAT], indexed by ArcadeDB’s native HNSW indexHNSW Vector Indexprovides approximate nearest neighbor search viavectorNeighbors()
Installation
arcadedb-haystack can be installed using pip:
pip install arcadedb-haystack
Usage
Once installed, you can start using ArcadeDBDocumentStore as any other document store that supports embeddings.
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=384,
similarity_function="cosine",
)
You will need a running ArcadeDB instance. The simplest way is with Docker:
docker run -d -p 2480:2480 \
-e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb" \
arcadedata/arcadedb:latest
Set credentials via environment variables:
export ARCADEDB_USERNAME=root
export ARCADEDB_PASSWORD=arcadedb
Writing documents
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy
documents = [
Document(
content="ArcadeDB supports graphs, documents, and vectors.",
meta={"source": "docs", "category": "database"},
)
]
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
Retrieving documents
ArcadeDBEmbeddingRetriever can be used in a pipeline to retrieve documents by querying the HNSW vector index with an embedded query, including metadata filtering:
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=384,
)
# Index documents with embeddings
documents = [
Document(content="My name is Morgan and I live in Paris.", meta={"release_date": "2018-12-09"})
]
document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"))
# Build retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
pipeline.add_component("retriever", ArcadeDBEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = pipeline.run(
data={
"text_embedder": {"text": "What cities do people live in?"},
"retriever": {
"top_k": 5,
"filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
},
}
)
documents = result["retriever"]["documents"]
More examples
You can find more examples in the repository:
- embedding_retrieval.py โ Full workflow demonstrating document indexing and vector similarity retrieval with ArcadeDB.
License
arcadedb-haystack is distributed under the terms of the
Apache 2.0 license.
