๐Ÿ†• Haystack 2.29 is here! Hybrid search with MultiRetriever and TextEmbeddingRetriever
Maintained by deepset

Integration: Sentence Transformers

Use Sentence Transformers embedding and ranking models in your Haystack pipelines

Authors
deepset

Table of Contents

Overview

Sentence Transformers is a library for state-of-the-art embedding and reranking models. With this integration, you can run Sentence Transformers compatible models from the Hugging Face Hub locally, on your own machine, in your Haystack pipelines.

Haystack supports Hugging Face models in other ways too:

  • Hugging Face Transformers for other local models (LLMs, extractive QA, classification, NER)
  • Hugging Face API to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI
  • Optimum for high-performance inference with ONNX Runtime

Installation

pip install haystack-ai "sentence-transformers>=5.0.0"

Usage

Components

Haystack provides several components based on Sentence Transformers:

Embedding Models

To create semantic embeddings for documents, use SentenceTransformersDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use SentenceTransformersTextEmbedder.

Below is an example of a document retrieval pipeline, after the documents have been indexed with their embeddings:

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}})

Sparse Embedding Models

Sparse embedding models like SPLADE produce interpretable embeddings and can perform better than dense models in out-of-domain settings. Currently, sparse embedding retrieval is supported by the Qdrant Document Store.

from haystack.components.embedders import SentenceTransformersSparseTextEmbedder

text_embedder = SentenceTransformersSparseTextEmbedder()

print(text_embedder.run("I love pizza!"))
# {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])}

Ranking Models

To rank documents based on their relevance to the query, use SentenceTransformersSimilarityRanker with a cross-encoder model:

from haystack import Document
from haystack.components.rankers import SentenceTransformersSimilarityRanker

ranker = SentenceTransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")

docs = [Document(content="Paris"), Document(content="Berlin")]
result = ranker.run(query="City in Germany", documents=docs)
print(result["documents"][0].content)
# Berlin