๐Ÿ†• Haystack 2.29 is here! Hybrid search with MultiRetriever and TextEmbeddingRetriever
Maintained by deepset

Integration: Hugging Face API

Use models through Hugging Face APIs - Inference Providers, Inference Endpoints, TGI and TEI

Authors
deepset

Table of Contents

Overview

With this integration, you can use models through Hugging Face APIs:

Haystack supports Hugging Face models in other ways too:

Installation

pip install huggingface-api-haystack

Usage

Unless you are using a self-hosted TGI/TEI server, set your Hugging Face token as the HF_API_TOKEN or HF_TOKEN environment variable.

Components

This integration provides several components to interact with Hugging Face APIs:

Chat Generation

Use HuggingFaceAPIChatGenerator with the Serverless Inference API (Inference Providers):

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.huggingface_api import HuggingFaceAPIChatGenerator

generator = HuggingFaceAPIChatGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "Qwen/Qwen2.5-7B-Instruct", "provider": "together"},
)

result = generator.run("What's Natural Language Processing? Be brief.")
print(result)

To use a dedicated Inference Endpoint or a self-hosted TGI server, pass its URL instead:

generator = HuggingFaceAPIChatGenerator(
    api_type="inference_endpoints",  # or "text_generation_inference" for self-hosted TGI
    api_params={"url": "<your-endpoint-url>"},
)

Embedding Models

To create semantic embeddings for documents, use HuggingFaceAPIDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use HuggingFaceAPITextEmbedder.

from haystack_integrations.components.embedders.huggingface_api import HuggingFaceAPITextEmbedder

text_embedder = HuggingFaceAPITextEmbedder(
    api_type="serverless_inference_api",
    api_params={"model": "BAAI/bge-small-en-v1.5"},
)

print(text_embedder.run("I love pizza!"))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...]}

Both embedders also work with a self-hosted TEI server:

text_embedder = HuggingFaceAPITextEmbedder(
    api_type="text_embeddings_inference",
    api_params={"url": "http://localhost:8080"},
)

Ranking Models

Use HuggingFaceTEIRanker to rank documents with a reranking model served by a TEI endpoint:

from haystack import Document
from haystack_integrations.components.rankers.huggingface_api import HuggingFaceTEIRanker

ranker = HuggingFaceTEIRanker(url="http://localhost:8080", top_k=2)

docs = [Document(content="The capital of France is Paris"),
        Document(content="The capital of Germany is Berlin")]

result = ranker.run(query="What is the capital of France?", documents=docs)
print(result["documents"][0].content)
# The capital of France is Paris