Integration: LiteLLM
Use any of 100+ LLM providers with Haystack through LiteLLM
Table of Contents
Overview
LiteLLM provides a single, unified interface to over 100 LLM providers, including OpenAI, Anthropic, Google, AWS Bedrock, Azure, Cohere, Mistral, and Groq. This integration brings that unified interface to Haystack through the LiteLLMChatGenerator, so you can switch between providers by changing only the model string, without rewriting your pipeline.
Model names use the LiteLLM provider/model-name format, for example openai/gpt-4o, anthropic/claude-sonnet-4-20250514, or bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0. For the full list of supported providers and their model identifiers, see the
LiteLLM providers documentation.
The LiteLLMChatGenerator supports streaming, tool/function calling, and asynchronous execution.
Installation
pip install litellm-haystack
Usage
LiteLLMChatGenerator needs an API key for the selected provider. LiteLLM reads it from the provider’s standard environment variable (for example, OPENAI_API_KEY or ANTHROPIC_API_KEY), so make sure the relevant variable is set before running. You can also pass the key explicitly through the api_key init parameter using Haystack’s Secret API.
Using LiteLLMChatGenerator
Here is a simple example that calls a model directly. Switch providers by changing only the model string.
# Set the relevant provider key, e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY, in your environment.
from haystack_integrations.components.generators.litellm import LiteLLMChatGenerator
from haystack.dataclasses import ChatMessage
generator = LiteLLMChatGenerator(
model="anthropic/claude-sonnet-4-20250514",
generation_kwargs={"max_tokens": 1024, "temperature": 0.7},
)
messages = [
ChatMessage.from_system("You are a helpful assistant"),
ChatMessage.from_user("What's Natural Language Processing? Be brief."),
]
result = generator.run(messages=messages)
print(result["replies"][0].text)
In a pipeline
Below is an example RAG pipeline that answers a question using the contents of a URL. We fetch the URL, convert it to a document, build the prompt, and generate the answer with the LiteLLMChatGenerator.
# Set the relevant provider key, e.g. OPENAI_API_KEY, in your environment.
# !pip install trafilatura
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.converters import HTMLToDocument
from haystack.components.fetchers import LinkContentFetcher
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.litellm import LiteLLMChatGenerator
messages = [
ChatMessage.from_system("You answer questions based on the given documents."),
ChatMessage.from_user(
"Here are the documents:\n"
"{% for d in documents %} \n"
" {{d.content}} \n"
"{% endfor %}"
"\nAnswer: {{query}}"
),
]
rag_pipeline = Pipeline()
rag_pipeline.add_component("fetcher", LinkContentFetcher())
rag_pipeline.add_component("converter", HTMLToDocument())
rag_pipeline.add_component("prompt_builder", ChatPromptBuilder(variables=["documents"]))
rag_pipeline.add_component("llm", LiteLLMChatGenerator(model="openai/gpt-4o"))
rag_pipeline.connect("fetcher", "converter")
rag_pipeline.connect("converter", "prompt_builder")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
question = "What is Haystack?"
result = rag_pipeline.run(
data={
"fetcher": {"urls": ["https://haystack.deepset.ai/overview/intro"]},
"prompt_builder": {"template_variables": {"query": question}, "template": messages},
}
)
print(result["llm"]["replies"][0].text)
Streaming
Pass a callback to streaming_callback to stream the response as it is generated. Use the built-in print_streaming_chunk to print text tokens and tool events.
from haystack.components.generators.utils import print_streaming_chunk
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.litellm import LiteLLMChatGenerator
generator = LiteLLMChatGenerator(model="openai/gpt-4o", streaming_callback=print_streaming_chunk)
generator.run([ChatMessage.from_user("Tell me about Natural Language Processing in two sentences.")])
