Integration: Sentence Transformers
Use Sentence Transformers embedding and ranking models in your Haystack pipelines
Table of Contents
Overview
Sentence Transformers is a library for state-of-the-art embedding and reranking models. With this integration, you can run Sentence Transformers compatible models from the Hugging Face Hub locally, on your own machine, in your Haystack pipelines.
Haystack supports Hugging Face models in other ways too:
- Hugging Face Transformers for other local models (LLMs, extractive QA, classification, NER)
- Hugging Face API to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI
- Optimum for high-performance inference with ONNX Runtime
Installation
pip install haystack-ai "sentence-transformers>=5.0.0"
Usage
Components
Haystack provides several components based on Sentence Transformers:
- Embedders:
-
SentenceTransformersTextEmbedder: creates a dense embedding for text (used in query/RAG pipelines). -
SentenceTransformersDocumentEmbedder: enriches documents with dense embeddings (used in indexing pipelines). -
SentenceTransformersSparseTextEmbedder: creates a sparse embedding for text (used in query/RAG pipelines). -
SentenceTransformersSparseDocumentEmbedder: enriches documents with sparse embeddings (used in indexing pipelines). -
SentenceTransformersDocumentImageEmbedder: enriches documents with embeddings computed from their images.
-
- Rankers:
-
SentenceTransformersSimilarityRanker: ranks documents based on their similarity to the query, using cross-encoder models. -
SentenceTransformersDiversityRanker: ranks documents to maximize their overall diversity.
-
Embedding Models
To create semantic embeddings for documents, use SentenceTransformersDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use SentenceTransformersTextEmbedder.
Below is an example of a document retrieval pipeline, after the documents have been indexed with their embeddings:
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]
document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}})
Sparse Embedding Models
Sparse embedding models like SPLADE produce interpretable embeddings and can perform better than dense models in out-of-domain settings. Currently, sparse embedding retrieval is supported by the Qdrant Document Store.
from haystack.components.embedders import SentenceTransformersSparseTextEmbedder
text_embedder = SentenceTransformersSparseTextEmbedder()
print(text_embedder.run("I love pizza!"))
# {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])}
Ranking Models
To rank documents based on their relevance to the query, use SentenceTransformersSimilarityRanker with a cross-encoder model:
from haystack import Document
from haystack.components.rankers import SentenceTransformersSimilarityRanker
ranker = SentenceTransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")
docs = [Document(content="Paris"), Document(content="Berlin")]
result = ranker.run(query="City in Germany", documents=docs)
print(result["documents"][0].content)
# Berlin
