Integration: Valkey
Use a Valkey database with Haystack
Table of Contents
Overview
Valkey is a high-performance, in-memory data structure store that you can use in Haystack pipelines with the ValkeyDocumentStore. Valkey operates in-memory by default for maximum performance, but can be configured with persistence options for data durability.
For a detailed overview of all the available methods and settings for the ValkeyDocumentStore, visit the Haystack
API Reference.
Installation
pip install valkey-haystack
Usage
To use Valkey as your data storage for your Haystack LLM pipelines, you must have a Valkey server with search module running. Learn how to spin up a Valkey server in the
Running Valkey-Haystack Locally section. Once you have that, you can initialize a ValkeyDocumentStore for Haystack:
from haystack_integrations.document_stores.valkey import ValkeyDocumentStore
document_store = ValkeyDocumentStore(
nodes_list=[("localhost", 6379)],
index_name="my_documents",
embedding_dim=768,
distance_metric="cosine"
)
Writing Documents to ValkeyDocumentStore
To write documents to your ValkeyDocumentStore, create an indexing pipeline, or use the write_documents() function.
For this step, you may make use of the available
Converters and
PreProcessors, as well as other integrations that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Valkey database.
Indexing Pipeline
from haystack import Pipeline
from haystack.components.converters import MarkdownToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.document_stores.valkey import ValkeyDocumentStore
document_store = ValkeyDocumentStore(
nodes_list=[("localhost", 6379)],
index_name="my_documents",
embedding_dim=768,
distance_metric="cosine"
)
indexing = Pipeline()
indexing.add_component("converter", MarkdownToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["filename.md"]}})
Using Valkey in a RAG Pipeline
Once you have documents in your ValkeyDocumentStore, they can be used in any Haystack pipeline. Then, you can use
ValkeyEmbeddingRetriever to retrieve data from your ValkeyDocumentStore. For example, below is a pipeline that uses a custom prompt designed to answer questions for the retrieved documents.
from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.document_stores.valkey import ValkeyDocumentStore
from haystack_integrations.components.retrievers.valkey import ValkeyEmbeddingRetriever
document_store = ValkeyDocumentStore(
nodes_list=[("localhost", 6379)],
index_name="my_documents",
embedding_dim=768,
distance_metric="cosine"
)
prompt_template = """Answer the following query based on the provided context. If the context does
not include an answer, reply with 'I don't know'.\n
Query: {{query}}
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Answer:
"""
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", ValkeyEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
query_pipeline.add_component("generator", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="gpt-4"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")
query = "What is Valkey?"
results = query_pipeline.run(
{
"text_embedder": {"text": query},
"prompt_builder": {"query": query},
}
)
For more examples, see the examples folder in the repository.
For more advanced configurations and clustering setups, refer to the Valkey documentation.
Running Valkey-Haystack Locally
To set up Valkey for development and testing with haystack-valkey:
- Start Valkey server:
docker run -d -p 6379:6379 valkey/valkey-bundle:latest
- Set up development environment:
git clone https://github.com/deepset-ai/haystack-core-integrations
cd integrations/valkey
- Run tests:
# Run unit tests only
hatch run test:unit
# Run integration tests (requires Valkey instance)
hatch run test:integration
# Run all tests
hatch run test:all
- Run examples:
uv sync --group examples
# Basic usage example
uv run examples/basic_usage.py
# Pipeline example
uv run examples/example.py
Performance Benefits
- In-Memory Storage: Lightning-fast read/write operations
- High Throughput: Handles thousands of operations per second
- Low Latency: Minimal response times for document operations
- Scalability: Supports clustering for horizontal scaling
Requirements
- Valkey server with search module running and accessible
- Python 3.10+
- Haystack 2.11+
License
valkey-haystack is distributed under the terms of the
Apache-2.0 license.
