Haystack 2.16.0

⭐️ Highlights

🧠 Agent Breakpoints

This release introduces Agent Breakpoints, a powerful new feature that enhances debugging and observability when working with Haystack Agents. You can pause execution mid-run by inserting breakpoints in the Agent or its tools to inspect internal state and resume execution seamlessly. This brings fine-grained control to agent development and significantly improves traceability during complex interactions.

from haystack.dataclasses.breakpoints import AgentBreakpoint, Breakpoint
from haystack.dataclasses import ChatMessage

chat_generator_breakpoint = Breakpoint(
    component_name="chat_generator", 
    visit_count=0, 
    snapshot_file_path="debug_snapshots"
)
agent_breakpoint = AgentBreakpoint(break_point=chat_generator_breakpoint, agent_name='calculator_agent')

response = agent.run(
    messages=[ChatMessage.from_user("What is 7 * (4 + 2)?")],
    break_point=agent_breakpoint
)

🖼️ Multimodal Pipelines and Agents

You can now blend text and image capabilities across generation, indexing, and retrieval in Haystack.

New ImageContent Dataclass: A dedicated structure to store image data along with base64_image, mime_type, detail, and metadata.
Image-Aware Chat Generators: Image inputs are now supported in OpenAIChatGenerator

from haystack.dataclasses import ImageContent, ChatMessage
from haystack.components.generators.chat import OpenAIChatGenerator

image_url = "https://cdn.britannica.com/79/191679-050-C7114D2B/Adult-capybara.jpg"
image_content = ImageContent.from_url(image_url)

message = ChatMessage.from_user(
    content_parts=["Describe the image in short.", image_content]
)

llm = OpenAIChatGenerator(model="gpt-4o-mini")
print(llm.run([message])["replies"][0].text)

Powerful Multimodal Components:
- PDFToImageContent, ImageFileToImageContent, DocumentToImageContent: Convert PDFs, image files, and Documents into ImageContent objects.
- LLMDocumentContentExtractor: Extract text from images using a vision-enabled LLM.
- SentenceTransformersDocumentImageEmbedder: Generate embeddings from image-based documents using models like CLIP.
- DocumentLengthRouter: Route documents based on textual content length—ideal for distinguishing scanned PDFs from text-based ones.
- DocumentTypeRouter: Route documents automatically based on MIME type metadata.
Prompt Building with Image Support: The ChatPromptBuilder now supports templates with embedded images, enabling dynamic multimodal prompt creation.

With these additions, you can now build multimodal agents and RAG pipelines that reason over both text and visual content, unlocking richer interactions and retrieval capabilities.

👉 Learn more about multimodality in our Introduction to Multimodal Text Generation.

🚀 New Features

Add to_dict and from_dict to ByteStream so it is consistent with our other dataclasses in having serialization and deserialization methods.
Add to_dict and from_dict to classes StreamingChunk, ToolCallResult, ToolCall, ComponentInfo, and ToolCallDelta to make it consistent with our other dataclasses in having serialization and deserialization methods.
Added the tool_invoker_kwargs param to Agent so additional kwargs can be passed to the ToolInvoker like max_workers and enable_streaming_callback_passthrough.

ChatPromptBuilder now supports special string templates in addition to a list of ChatMessage objects. This new format is more flexible and allows structured parts like images to be included in the templatized ChatMessage.

from haystack.components.builders import ChatPromptBuilder 
from haystack.dataclasses.chat_message import ImageContent

template = """
{% message role="user" %}
Hello! I am {{user_name}}. 
What's the difference between the following images? 
{% for image in images %}
{{ image | templatize_part }}
{% endfor %}
{% endmessage %}
""" 

images=[
    ImageContent.from_file_path("apple-fruit.jpg"),
    ImageContent.from_file_path("apple-logo.jpg")
]

builder = ChatPromptBuilder(template=template) builder.run(user_name="John", images=images)

Added convenience class methods to the ImageContent dataclass to create ImageContent objects from file paths and URLs.
Added multiple converters to help convert image data between different formats:
DocumentToImageContent: Converts documents sourced from PDF and image files into ImageContents.
ImageFileToImageContent: Converts image files to ImageContent objects.
ImageFileToDocument: Converts image file references into empty Document objects with associated metadata.
PDFToImageContent: Converts PDF files to ImageContent objects.
Chat Messages with the user role can now include images using the new ImageContent dataclass. We’ve added image support to OpenAIChatGenerator, and plan to support more model providers over time.
Raise a warning when a pipeline can no longer proceed because all remaining components are blocked from running and no expected pipeline outputs have been produced. This scenario can occur legitimately. For example, in pipelines with mutually exclusive branches where some components are intentionally blocked. To help avoid false positives, the check ensures that none of the expected outputs (as defined by Pipeline().outputs()) have been generated during the current run.
Added source_id_meta_field and split_id_meta_field to SentenceWindowRetriever for customizable metadata field names. Added raise_on_missing_meta_fields to control whether a ValueError is raised if any of the documents at runtime are missing the required meta fields (set to True by default). If False, then the documents missing the meta field will be skipped when retrieving their windows, but the original document will still be included in the results.
Add a ComponentInfo dataclass to the haystack.dataclasses module. This dataclass is used to store information about the component. We pass it to StreamingChunk so we can tell from which component a stream is coming from.
Pass the component_info to the StreamingChunk in the OpenAIChatGenerator, AzureOpenAIChatGenerator, HuggingFaceAPIChatGenerator and HuggingFaceLocalChatGenerator.
Added the enable_streaming_callback_passthrough to the ToolInvoker init, run and run_async methods. If set to True the ToolInvoker will try and pass the streaming_callback function to a tool’s invoke method only if the tool’s invoke method has streaming_callback in its signature.
Added dedicated finish_reason field to StreamingChunk class to improve type safety and enable sophisticated streaming UI logic. The field uses a FinishReason type alias with standard values: “stop”, “length”, “tool_calls”, “content_filter”, plus Haystack-specific value “tool_call_results” (used by ToolInvoker to indicate tool execution completion).
Updated ToolInvoker component to use the new finish_reason field when streaming tool results. The component now sets finish_reason="tool_call_results" in the final streaming chunk to indicate that tool execution has completed, while maintaining backward compatibility by also setting the value in meta["finish_reason"].
Added new HuggingFaceTEIRanker component to enable reranking with Text Embeddings Inference (TEI) API. This component supports both self-hosted Text Embeddings Inference services and Hugging Face Inference Endpoints.
Added a raise_on_failure boolean parameter to OpenAIDocumentEmbedder and AzureOpenAIDocumentEmbedder. If set to True then the component will raise an exception when there is an error with the API request. It is set to False by default to so the previous behavior of logging an exception and continuing is still the default.
ToolInvoker now executes tool_calls in parallel for both sync and async mode.
Add AsyncHFTokenStreamingHandler for async streaming support in HuggingFaceLocalChatGenerator
We introduced the LLMMessagesRouter component, which routes Chat Messages to different connections, using a generative Language Model to perform classification. This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard.

Usage example:

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.components.routers.llm_messages_router import LLMMessagesRouter
from haystack.dataclasses import ChatMessage

# initialize a Chat Generator with a generative model for moderation
chat_generator = HuggingFaceAPIChatGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"},
)

router = LLMMessagesRouter(
    chat_generator=chat_generator,
    output_names=["unsafe", "safe"],
    output_patterns=["unsafe", "safe"]
)

print(router.run([ChatMessage.from_user("How to rob a bank?")]))

For HuggingFaceAPIGenerator and HuggingFaceAPIChatGenerator all additional key, value pairs passed in api_params are now passed to the initializations of the underlying Inference Clients. This allows passing of additional parameters to the clients like timeout, headers, provider, etc. This means we now can easily specify a different inference provider by passing the provider key in api_params.
Updated StreamingChunk to add the fields tool_calls, tool_call_result, index, and start to make it easier to format the stream in a streaming callback.
Added new dataclass ToolCallDelta for the StreamingChunk.tool_calls field to reflect that the arguments can be a string delta.
Updated print_streaming_chunk and _convert_streaming_chunks_to_chat_message utility methods to use these new fields. This especially improves the formatting when using print_streaming_chunk with Agent.
Updated OpenAIGenerator, OpenAIChatGenerator, HuggingFaceAPIGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalGenerator and HuggingFaceLocalChatGenerator to follow the new dataclasses.
Updated ToolInvoker to follow the StreamingChunk dataclass.

⬆️ Upgrade Notes

HuggingFaceAPIGenerator might no longer work with the Hugging Face Inference API. As of July 2025, the Hugging Face Inference API no longer offers generative models that support the text_generation endpoint. Generative models are now only available through providers that support the chat_completion endpoint. As a result, the HuggingFaceAPIGenerator component might not work with the Hugging Face Inference API. It still works with Hugging Face Inference Endpoints and self-hosted TGI instances. To use generative models via Hugging Face Inference API, please use the HuggingFaceAPIChatGenerator component, which supports the chat_completion endpoint.
All parameters of the Pipeline.draw() and Pipeline.show() methods must now be specified as keyword arguments. Example:

pipeline.draw(
    path="output.png",
    server_url="https://custom-server.com",
    params=None,
    timeout=30,
    super_component_expansion=False
)

The deprecated async_executor parameter has been removed from the ToolInvoker class. Please use the max_workers parameter instead and a ThreadPoolExecutor with these workers will be created automatically for parallel tool invocations.
The deprecated State class has been removed from the haystack.dataclasses module. The State class is now part of the haystack.components.agents module.
Remove the deserialize_value_with_schema_legacy function from the base_serialization module. This function was used to deserialize State objects created with Haystack 2.14.0 or older. Support for the old serialization format is removed in Haystack 2.16.0.

⚡️ Enhancement Notes

Add guess_mime_type parameter to Bytestream.from_file_path()
Add the init parameter skip_empty_documents to the DocumentSplitter component. The default value is True. Setting it to False can be useful when downstream components in the Pipeline (like LLMDocumentContentExtractor) can extract text from non-textual documents.
Test that our type validation and connection validation works with builtin python types introduced in 3.9. We found that these types were already supported, we just now add explicit tests for them.
We relaxed the requirement that in ToolCallDelta (introduced in Haystack 2.15) which required the parameters arguments or name to be populated to be able to create a ToolCallDelta dataclass. We remove this requirement to be more in line with OpenAI’s SDK and since this was causing errors for some hosted versions of open source models following OpenAI’s SDK specification.
Added return_embedding parameter inside InMemoryDocumentStore::init method.
Updated methods bm25_retrieval, and filter_documents to use self.return_embedding to determine whether embeddings are returned.
Updated tests (test_in_memory & test_in_memory_embedding_retriever) to reflect the changes in the InMemoryDocumentStore.
Added a new deserialize_component_inplace function to handle generic component deserialization that works with any component type.
Made doc-parser a core dependency since ComponentTool that uses it is one of the core Tool components.
Make the PipelineBase().validate_input method public so users can use it with the confidence that it won’t receive breaking changes without warning. This method is useful for checking that all required connections in a pipeline have a connection and is automatically called in the run method of Pipeline. It is being exposed as public for users who would like to call this method before runtime to validate the pipeline.
For component run Datadog tracing, set the span resource name to the component name instead of the operation name.
Added a trust_remote_code parameter to the SentenceTransformersSimilarityRanker component. When set to True, this enables execution of custom models and scripts hosted on the Hugging Face Hub.
Add a new parameter require_tool_call_ids to ChatMessage.to_openai_dict_format. The default is True, for compatibility with OpenAI’s Chat API: if the id field is missing in a Tool Call, an error is raised. Using False is useful for shallow OpenAI-compatible APIs, where the id field is not required.
Haystack’s core modules are now [ “type complete”]( https://typing.python.org/en/latest/guides/libraries.html#how-much-of-my-library-needs-types), meaning that all function parameters and return types are explicitly annotated. This increases the usefulness of the newly added py.typed marker and sidesteps differences in type inference between the various type checker implementations.
Refactors the HuggingFaceAPIChatGenerator to use the util method _convert_streaming_chunks_to_chat_message. This is to help with being consistent for how we convert StreamingChunks into a final ChatMessage.
We also add ComponentInfo to the StreamingChunks made in HuggingFaceGenerator, and HugginFaceLocalGenerator so we can tell from which component a stream is coming from.
If only system messages are provided as input a warning will be logged to the user indicating that this likely not intended and that they should probably also provide user messages.

🐛 Bug Fixes

Fix _convert_streaming_chunks_to_chat_message which is used to convert Haystack StreamingChunks into a Haystack ChatMessage. This fixes the scenario where one StreamingChunk contains two ToolCallDeltas in StreamingChunk.tool_calls. With this fix this correctly saves both ToolCallDeltas whereas before they were overwriting each other. This only occurs with some LLM providers like Mistral (and not OpenAI) due to how the provider returns tool calls.
Fixed a bug in the print_streaming_chunk utility function that prevented tool call name from being printed.
Fixed the to_dict and from_dict of ToolInvoker to properly serialize the streaming_callback init parameter.
Fix bug where if raise_on_failure=False and an error occurs mid-batch that the following embeddings would be paired with the wrong documents.
Fix component_invoker used by ComponentTool to work when a dataclass like ChatMessage is directly passed to component_tool.invoke(...). Previously this would either cause an error or silently skip your input.
Fixed a bug in the LLMMetadataExtractor that occurred when processing Document objects with None or empty string content. The component now gracefully handles these cases by marking such documents as failed and providing an appropriate error message in their metadata, without attempting an LLM call.
RecursiveDocumentSplitter now generates a unique Document.id for every chunk. The meta fields (split_id, parent_id, etc.) are populated [ before]() Document creation, so the hash used for id generation is always unique.
In ConditionalRouter fixed the to_dict and from_dict methods to properly handle the case when output_type is a List of types or a List of strings. This occurs when a user specifies a route in ConditionalRouter to have multiple outputs.
Fix serialization of GeneratedAnswer when ChatMessage objects are nested in meta.
Fix the serialization of ComponentTool and Tool when specifying outputs_to_string. Previously an error occurred on deserialization right after serializing if outputs_to_string is not None.
When calling set_output_types we now also check that the decorator @component.output_types is not present on the run_async method of a Component. Previously we only checked that the Component.run method did not possess the decorator.
Fix type comparison in schema validation by replacing is not with != when checking the type List[ChatMessage]. This prevents false mismatches due to Python’s is operator comparing object identity instead of equality.
Re-export symbols in __init__.py files. This ensures that short imports like from haystack.components.builders import ChatPromptBuilder work equivalently to from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder, without causing errors or warnings in mypy/Pylance.
The SuperComponent class can now correctly serialize and deserialize a SuperComponent based on an async pipeline. Previously, the SuperComponent class always assumed the underlying pipeline was synchronous.

⚠️ Deprecation Notes

async_executor parameter in ToolInvoker is deprecated in favor of max_workers parameter and will be removed in Haystack 2.16.0. You can use max_workers parameter to control the number of threads used for parallel tool calling.

💙 Big thank you to everyone who contributed to this release!

@Amnah199 @RafaelJohn9 @anakin87 @bilgeyucel @davidsbatista @julian-risch @kanenorman @kr1shnasomani @mathislucka @mpangrazzi @sjr @srishti-git1110