๐Ÿ†• Haystack 2.29 is here! Hybrid search with MultiRetriever and TextEmbeddingRetriever
Maintained by deepset

Integration: FalkorDB

Use FalkorDB as a document store with native vector search for GraphRAG workloads in Haystack

Authors
deepset

Table of Contents

Overview

An integration of FalkorDB with Haystack by deepset.

FalkorDB is a high-performance graph database optimized for GraphRAG workloads. It stores documents as graph nodes and supports native vector search โ€” no APOC is required. All bulk writes use UNWIND + MERGE for safe, idiomatic OpenCypher upserts.

The library provides a FalkorDBDocumentStore that implements the Haystack DocumentStore protocol, plus two pipeline-ready retriever components:

  • FalkorDBDocumentStore โ€” stores Documents as labeled graph nodes in a named FalkorDB graph, with meta fields stored flat alongside id and content. Embeddings are indexed using FalkorDB’s native vector index.
  • FalkorDBEmbeddingRetriever โ€” a retriever component that queries the native vector index to find Documents by dense similarity, with support for metadata filtering.
  • FalkorDBCypherRetriever โ€” a power-user retriever for executing arbitrary OpenCypher queries, enabling graph traversal and multi-hop queries in GraphRAG pipelines.
                                   +-----------------------------+
                                   |      FalkorDB Database      |
                                   +-----------------------------+
                                   |                             |
                                   |      +----------------+     |
                                   |      |    Document    |     |
                write_documents    |      +----------------+     |
          +------------------------+----->|   properties   |     |
          |                        |      |                |     |
+---------+----------+             |      |   embedding    |     |
|                    |             |      +--------+-------+     |
| FalkorDBDocument   |             |               |             |
|       Store        |             |               |index/query  |
+---------+----------+             |               |             |
          |                        |     +---------+---------+   |
          |                        |     | Native Vector Idx |   |
          +----------------------->|     |                   |   |
              _embedding_retrieval |     |  (vecf32 index)   |   |
                                   |     +-------------------+   |
                                   |                             |
                                   +-----------------------------+

In the above diagram:

  • Document is a FalkorDB node with a configurable label (default: "Document")
  • properties are Document attributes and meta fields stored flat on the node
  • embedding is stored as a vecf32 vector property indexed by FalkorDB’s native vector index
  • The native vector index enables approximate nearest neighbor search via db.idx.vector.queryNodes

Installation

falkordb-haystack can be installed using pip:

pip install falkordb-haystack

You will need a running FalkorDB instance. The simplest way is with Docker:

docker run -d -p 6379:6379 falkordb/falkordb:latest

Usage

from haystack_integrations.document_stores.falkordb import FalkorDBDocumentStore

document_store = FalkorDBDocumentStore(
    host="localhost",
    port=6379,
    embedding_dim=384,
    similarity="cosine",
)

Writing documents

from haystack import Document
from haystack.document_stores.types import DuplicatePolicy

documents = [
    Document(
        content="FalkorDB is a high-performance graph database for GraphRAG.",
        meta={"source": "docs", "category": "database"},
    )
]
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)

Retrieving documents

FalkorDBEmbeddingRetriever can be used in a pipeline to retrieve documents by querying the native vector index with an embedded query, with optional metadata filtering:

from haystack import Document, Pipeline
from haystack.components.embedders import (
    SentenceTransformersDocumentEmbedder,
    SentenceTransformersTextEmbedder,
)
from haystack_integrations.document_stores.falkordb import FalkorDBDocumentStore
from haystack_integrations.components.retrievers.falkordb import FalkorDBEmbeddingRetriever

document_store = FalkorDBDocumentStore(
    host="localhost",
    port=6379,
    embedding_dim=384,
    recreate_graph=True,
)

documents = [
    Document(
        content="My name is Morgan and I live in Paris.",
        meta={"release_date": "2018-12-09"},
    )
]

document_embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2"
)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings["documents"])

pipeline = Pipeline()
pipeline.add_component(
    "text_embedder",
    SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
pipeline.add_component(
    "retriever",
    FalkorDBEmbeddingRetriever(document_store=document_store),
)
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = pipeline.run(
    data={
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "top_k": 5,
            "filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
        },
    }
)

documents = result["retriever"]["documents"]

Graph queries with Cypher

FalkorDBCypherRetriever allows you to run arbitrary OpenCypher queries against the graph, which is useful for multi-hop traversals and custom GraphRAG patterns. Use parameterized queries to avoid injection vulnerabilities:

from haystack_integrations.document_stores.falkordb import FalkorDBDocumentStore
from haystack_integrations.components.retrievers.falkordb import FalkorDBCypherRetriever

document_store = FalkorDBDocumentStore(host="localhost", port=6379)

retriever = FalkorDBCypherRetriever(
    document_store=document_store,
    custom_cypher_query="MATCH (d:Document {topic: $topic}) RETURN d",
)

result = retriever.run(parameters={"topic": "GraphRAG"})
documents = result["documents"]

License

falkordb-haystack is distributed under the terms of the Apache 2.0 license.