๐Ÿ”Ž Haystack 2.26 is here! LLMRanker for high-quality context & Jinja2 support for dynamic system prompts in Agents

Integration: ArcadeDB

Use ArcadeDB as a document store with native HNSW vector search for Haystack

Authors
ArcadeData Ltd

Table of Contents

Overview

An integration of ArcadeDB with Haystack by ArcadeData.

ArcadeDB is a multi-model database that combines document storage, HNSW vector search, and SQL-based metadata filtering:

  • Document storage โ€” vertex-based records with flexible MAP metadata
  • HNSW vector search โ€” native approximate nearest neighbor index via vectorNeighbors() (cosine, euclidean, dot product)
  • SQL filtering โ€” full SQL WHERE clauses on metadata fields
  • No special drivers โ€” pure HTTP/JSON API, no binary protocol or custom driver required

The library provides an ArcadeDBDocumentStore that implements the Haystack DocumentStore protocol, plus pipeline-ready retriever components:

  • ArcadeDBDocumentStore โ€” stores Documents as ArcadeDB vertices with embeddings indexed by a dedicated HNSW Vector Index for dense retrieval.
  • ArcadeDBEmbeddingRetriever โ€” a retriever component that queries the vector index to find related Documents, with support for metadata filtering and runtime parameter overrides.
                                   +-----------------------------+
                                   |      ArcadeDB Database      |
                                   +-----------------------------+
                                   |                             |
                                   |      +----------------+     |
                                   |      |    Document    |     |
                write_documents    |      +----------------+     |
          +------------------------+----->|   properties   |     |
          |                        |      |                |     |
+---------+----------+             |      |   embedding    |     |
|                    |             |      +--------+-------+     |
| ArcadeDBDocument   |             |               |             |
|       Store        |             |               |index/query  |
+---------+----------+             |               |             |
          |                        |     +---------+---------+   |
          |                        |     | HNSW Vector Index |   |
          +----------------------->|     |                   |   |
              _embedding_retrieval |     | (for embedding)   |   |
                                   |     +-------------------+   |
                                   |                             |
                                   +-----------------------------+

In the above diagram:

  • Document is an ArcadeDB vertex type
  • properties are Document attributes stored as vertex properties
  • embedding is a vector property of type LIST[FLOAT], indexed by ArcadeDB’s native HNSW index
  • HNSW Vector Index provides approximate nearest neighbor search via vectorNeighbors()

Installation

arcadedb-haystack can be installed using pip:

pip install arcadedb-haystack

Usage

Once installed, you can start using ArcadeDBDocumentStore as any other document store that supports embeddings.

from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore

document_store = ArcadeDBDocumentStore(
    url="http://localhost:2480",
    database="haystack",
    embedding_dimension=384,
    similarity_function="cosine",
)

You will need a running ArcadeDB instance. The simplest way is with Docker:

docker run -d -p 2480:2480 \
    -e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb" \
    arcadedata/arcadedb:latest

Set credentials via environment variables:

export ARCADEDB_USERNAME=root
export ARCADEDB_PASSWORD=arcadedb

Writing documents

from haystack import Document
from haystack.document_stores.types import DuplicatePolicy

documents = [
    Document(
        content="ArcadeDB supports graphs, documents, and vectors.",
        meta={"source": "docs", "category": "database"},
    )
]
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)

Retrieving documents

ArcadeDBEmbeddingRetriever can be used in a pipeline to retrieve documents by querying the HNSW vector index with an embedded query, including metadata filtering:

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore

document_store = ArcadeDBDocumentStore(
    url="http://localhost:2480",
    database="haystack",
    embedding_dimension=384,
)

# Index documents with embeddings
documents = [
    Document(content="My name is Morgan and I live in Paris.", meta={"release_date": "2018-12-09"})
]

document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"))

# Build retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
pipeline.add_component("retriever", ArcadeDBEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = pipeline.run(
    data={
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "top_k": 5,
            "filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
        },
    }
)

documents = result["retriever"]["documents"]

More examples

You can find more examples in the repository:

  • embedding_retrieval.py โ€” Full workflow demonstrating document indexing and vector similarity retrieval with ArcadeDB.

License

arcadedb-haystack is distributed under the terms of the Apache 2.0 license.