Maintained by deepset
Integration: Pyversity
A Ranker component for result diversification in retrieval pipelines
Table of Contents
Overview
Pyversity is a library for result diversification in retrieval pipelines. This integration wraps pyversity’s diversification algorithms as a Haystack component, making it easy to balance relevance and diversity in your search results.
The PyversityRanker reranks documents by trading off between relevance scores and embedding-based diversity using strategies such as Maximal Marginal Relevance (MMR) or Determinantal Point Processes (DPP).
Installation
pip install pyversity-haystack
Usage
Components
This integration introduces one component:
PyversityRanker: Reranks documents using pyversity’s diversification algorithms. Documents must have bothscoreandembeddingpopulated (e.g. as returned by a dense retriever withreturn_embedding=True).
The ranker accepts the following parameters:
top_k: Number of documents to return. IfNone, all documents are returned in diversified order.strategy: Diversification strategy โMMR,MSD,DPP,COVER, orSSD. Defaults toStrategy.DPP. See supported strategies for details.diversity: Trade-off between relevance and diversity in[0, 1].0.0keeps only the most relevant documents;1.0maximises diversity. Defaults to0.5.
Standalone
from haystack import Document
from haystack_integrations.components.rankers.pyversity import PyversityRanker
from pyversity import Strategy
ranker = PyversityRanker(top_k=5, strategy=Strategy.MMR, diversity=0.5)
docs = [
Document(content="Paris is the capital of France.", score=0.9, embedding=[0.1, 0.2]),
Document(content="Berlin is the capital of Germany.", score=0.8, embedding=[0.3, 0.4]),
Document(content="The Eiffel Tower is in Paris.", score=0.7, embedding=[0.15, 0.25]),
]
output = ranker.run(documents=docs)
reranked_docs = output["documents"]
Pipeline
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from pyversity import Strategy
from haystack_integrations.components.rankers.pyversity import PyversityRanker
# Index documents
document_store = InMemoryDocumentStore()
raw_documents = [
Document(content="Paris is the capital of France."),
Document(content="The Eiffel Tower is located in Paris."),
Document(content="Berlin is the capital of Germany."),
Document(content="The Brandenburg Gate is in Berlin."),
Document(content="France borders Spain to the south."),
Document(content="The Louvre is the world's largest art museum and is in Paris."),
]
doc_embedder = SentenceTransformersDocumentEmbedder()
documents_with_embeddings = doc_embedder.run(raw_documents)["documents"]
document_store.write_documents(documents_with_embeddings)
# Build pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store, top_k=6, return_embedding=True),
)
pipeline.add_component("reranker", PyversityRanker(top_k=3, strategy=Strategy.MMR, diversity=0.7))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "reranker.documents")
# Run
result = pipeline.run({"text_embedder": {"text": "What are the famous landmarks in France?"}})
for doc in result["reranker"]["documents"]:
print(f"{doc.score:.4f} {doc.content}")
License
pyversity-haystack is distributed under the terms of the
Apache-2.0 license.
