Integration: Vespa
Use the Vespa search engine as a document store with Haystack
Table of Contents
Overview
Vespa is an open-source search engine and vector database that supports
vector search, lexical search, and search in structured data, all in the same query. This
integration lets you use Vespa as a DocumentStore in Haystack pipelines and provides
retrievers for both embedding-based and keyword-based search.
It is built on top of pyvespa and expects a Vespa application to be running and reachable (locally via Docker, on Vespa Cloud, or self-hosted). The Vespa schema, including the fields and ranking profiles used by the retrievers, must be defined on the Vespa application before you start indexing or querying.
When connecting to
Vespa Cloud, VespaDocumentStore supports either
token-based authentication via vespa_cloud_secret_token (or the VESPA_CLOUD_SECRET_TOKEN
environment variable) or mTLS authentication via the cert and key parameters pointing to
your data plane certificate and key files.
Installation
pip install vespa-haystack
The integration requires Python 3.10+, haystack-ai>=2.28.0 and pyvespa>=0.58.0.
Usage
Components
This integration introduces the following components:
-
VespaDocumentStore: aDocumentStorebacked by a Vespa application. It connects to the Vespa endpoint (VESPA_URLby default) and reads/writes documents into the configured schema and namespace. -
VespaEmbeddingRetriever: retrieves documents from aVespaDocumentStoreusing vector similarity (nearest-neighbor search on the configured embedding field). -
VespaKeywordRetriever: retrieves documents from aVespaDocumentStoreusing Vespa’s lexical search (e.g. BM25 ranking).
Indexing and embedding retrieval
from haystack import Pipeline
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.writers import DocumentWriter
from haystack.dataclasses import Document
from haystack_integrations.components.retrievers.vespa import VespaEmbeddingRetriever
from haystack_integrations.document_stores.vespa import VespaDocumentStore
document_store = VespaDocumentStore(
schema="doc",
namespace="doc",
content_field="content",
embedding_field="embedding",
metadata_fields=["category"],
)
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=document_store))
indexing.connect("embedder", "writer")
indexing.run({"embedder": {"documents": [
Document(id="1", content="Haystack integrates with Vespa for search.", meta={"category": "docs"}),
]}})
querying = Pipeline()
querying.add_component("text_embedder", SentenceTransformersTextEmbedder())
querying.add_component(
"retriever",
VespaEmbeddingRetriever(
document_store=document_store,
top_k=2,
query_tensor_name="query_embedding",
),
)
querying.connect("text_embedder", "retriever")
results = querying.run({"text_embedder": {"text": "semantic vector search"}})
Keyword retrieval
from haystack import Pipeline
from haystack_integrations.components.retrievers.vespa import VespaKeywordRetriever
from haystack_integrations.document_stores.vespa import VespaDocumentStore
document_store = VespaDocumentStore(
schema="doc",
namespace="doc",
content_field="content",
metadata_fields=["category", "author"],
)
querying = Pipeline()
querying.add_component(
"retriever",
VespaKeywordRetriever(
document_store=document_store,
top_k=2,
filters={"field": "meta.category", "operator": "==", "value": "docs"},
),
)
results = querying.run({"retriever": {"query": "vector retrieval"}})
License
vespa-haystack is distributed under the terms of the
Apache-2.0 license.
