Build with Llama Stack and Haystack Agent
Last Updated: July 21, 2025
This notebook demonstrates how to use the LlamaStackChatGenerator
component with Haystack
Agent to enable function calling capabilities. We’ll create a simple weather tool that the Agent
can call to provide dynamic, up-to-date information.
We start with installing integration package.
%%bash
pip install llama-stack-haystack
Setup
Before running this example, you need to:
- Set up Llama Stack Server through an inference provider
- Have a model available (e.g.,
llama3.2:3b
)
For a quick start on how to setup server with Ollama, see the Llama Stack documentation.
Once you have the server running, it will typically be available at http://localhost:8321/v1/openai/v1
.
Defining a Tool
Tool in Haystack allow models to call functions to get real-time information or perform actions. Let’s create a simple weather tool that the model can use to provide weather information.
from haystack.dataclasses import ChatMessage
from haystack.tools import Tool
# Define a tool that models can call
def weather(city: str):
"""Return mock weather info for the given city."""
return f"The weather in {city} is sunny and 32°C"
# Define the tool parameters schema
tool_parameters = {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
# Create the weather tool
weather_tool = Tool(
name="weather",
description="Useful for getting the weather in a specific city",
parameters=tool_parameters,
function=weather,
)
Setting Up Agent
Now, let’s create a LlamaStackChatGenerator
and pass it to the Agent
. The Agent component will use the model running with LlamaStackChatGenerator
to reason and make decisions.
from haystack.components.agents import Agent
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
from haystack.components.generators.utils import print_streaming_chunk
# Create the LlamaStackChatGenerator
chat_generator = LlamaStackChatGenerator(
model="ollama/llama3.2:3b", # model name varies depending on the inference provider used for the Llama Stack Server
api_base_url="http://localhost:8321/v1/openai/v1",
)
# Agent Setup
agent = Agent(
chat_generator=chat_generator,
tools=[weather_tool],
)
# Run the Agent
agent.warm_up()
Using Tools with the Agent
Now, when we ask questions, the Agent
will utilize both the provided tool
and the LlamaStackChatGenerator
to generate answers. We enable the streaming in Agent
through streaming_callback
, so you can observe the tool calls and results in real time.
# Create a message asking about the weather
messages = [ChatMessage.from_user("What's the weather in Tokyo?")]
# Generate a response from the model with access to tools
response = agent.run(messages=messages, tools=[weather_tool], streaming_callback=print_streaming_chunk,
)
[TOOL CALL]
Tool: weather
Arguments: {"city":"Tokyo"}
[TOOL RESULT]
The weather in Tokyo is sunny and 32°C
In[ASSISTANT]
Tokyo, the current weather conditions are mostly sunny with a temperature of 32°C. Would you like to know more about Tokyo's climate or weather forecast for a specific date?
Simple Chat with ChatGenerator
For a simpler use case, you can also create a lightweight mechanism to chat directly with LlamaStackChatGenerator
.
messages = []
while True:
msg = input("Enter your message or Q to exit\n🧑 ")
if msg=="Q":
break
messages.append(ChatMessage.from_user(msg))
response = chat_generator.run(messages=messages)
assistant_resp = response['replies'][0]
print("🤖 "+assistant_resp.text)
messages.append(assistant_resp)
🤖 The main character in The Witcher series, also known as the eponymous figure, is Geralt of Rivia, a monster hunter with supernatural abilities and mutations that allow him to control the elements. He was created by Polish author_and_polish_video_game_development_company_(CD Projekt).
🤖 One of the most fascinating aspects of dolphin behavior is their ability to produce complex, context-dependent vocalizations that are unique to each individual, similar to human language. They also exhibit advanced social behaviors, such as cooperation, empathy, and self-awareness.
If you want to switch your model provider, you can reuse the same LlamaStackChatGenerator
code with different providers. Simply run the desired inference provider on the Llama Stack Server and update the model
name during the initialization of LlamaStackChatGenerator
.
For more details on available inference providers, see Llama Stack docs.