Integration: Dewey
Connect Haystack pipelines to Dewey โ a managed document intelligence backend that handles PDF conversion, chunking, embedding, and hybrid retrieval behind a single API.
Table of Contents
Overview
Dewey is a managed document intelligence backend for AI applications. Upload PDFs, Word docs, and other files โ Dewey handles conversion, section extraction, chunking, embedding, and hybrid semantic + BM25 retrieval automatically.
This integration provides three Haystack 2.0 components:
DeweyDocumentStoreโ implements the HaystackDocumentStoreprotocol, backed by a Dewey collectionDeweyRetrieverโ a@componentthat runs hybrid search against a collection and returns rankedDocumentobjectsDeweyResearchComponentโ a@componentthat runs Dewey’s full agentic research loop (multi-step search, synthesis, citations) and returns a grounded Markdown answer
Installation
pip install dewey-haystack
Requires a free Dewey account at meetdewey.com. Set your API key:
export DEWEY_API_KEY="dwy_live_..."
Usage
Components
This integration introduces three components:
DeweyDocumentStore(haystack_integrations.document_stores.dewey)DeweyRetriever(haystack_integrations.components.retrievers.dewey)DeweyResearchComponent(haystack_integrations.components.retrievers.dewey)
RAG pipeline with DeweyRetriever
import os
from haystack import Pipeline
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
from haystack_integrations.components.retrievers.dewey import DeweyRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
store = DeweyDocumentStore(
api_key=Secret.from_env_var("DEWEY_API_KEY"),
collection_id="3f7a1b2c-...", # your collection ID
)
prompt_template = """
Answer the question using only the provided context.
Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ query }}
"""
pipeline = Pipeline()
pipeline.add_component("retriever", DeweyRetriever(document_store=store, top_k=5))
pipeline.add_component("prompt", PromptBuilder(template=prompt_template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
pipeline.connect("retriever.documents", "prompt.documents")
pipeline.connect("prompt.prompt", "llm.prompt")
result = pipeline.run({
"retriever": {"query": "What are the key findings?"},
"prompt": {"query": "What are the key findings?"},
})
print(result["llm"]["replies"][0])
Agentic research with DeweyResearchComponent
DeweyResearchComponent is a drop-in replacement for an LLM generator when you want Dewey to handle both retrieval and generation. It runs a multi-step research loop internally and returns a grounded answer with cited sources.
import os
from haystack import Pipeline
from haystack_integrations.components.retrievers.dewey import DeweyResearchComponent
from haystack.utils import Secret
pipeline = Pipeline()
pipeline.add_component(
"research",
DeweyResearchComponent(
api_key=Secret.from_env_var("DEWEY_API_KEY"),
collection_id="3f7a1b2c-...",
depth="balanced", # "quick" | "balanced" | "deep" | "exhaustive"
),
)
result = pipeline.run({"research": {"query": "What were the key findings across all studies?"}})
print(result["research"]["answer"])
for source in result["research"]["sources"]:
print(f" [{source.meta['filename']}] {source.content[:80]}...")
Writing documents
Upload content to Dewey directly from a Haystack pipeline using DeweyDocumentStore.write_documents:
from haystack import Document
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
from haystack.utils import Secret
store = DeweyDocumentStore(
api_key=Secret.from_env_var("DEWEY_API_KEY"),
collection_id="3f7a1b2c-...",
)
store.write_documents([
Document(content="Neural networks learn via backpropagation.", meta={"source": "ml-intro.txt"}),
Document(content="Transformers use self-attention mechanisms.", meta={"source": "transformers.txt"}),
])
License
dewey-haystack is released under the
MIT License.
