Build with Llama Stack and Haystack Agent

_{Last Updated:
July 21, 2025}

This notebook demonstrates how to use the LlamaStackChatGenerator component with Haystack Agent to enable function calling capabilities. We’ll create a simple weather tool that the Agent can call to provide dynamic, up-to-date information.

We start with installing integration package.

%%bash

pip install llama-stack-haystack

Setup

Before running this example, you need to:

Set up Llama Stack Server through an inference provider
Have a model available (e.g., llama3.2:3b)

For a quick start on how to setup server with Ollama, see the Llama Stack documentation.

Once you have the server running, it will typically be available at http://localhost:8321/v1/openai/v1.

Defining a Tool

Tool in Haystack allow models to call functions to get real-time information or perform actions. Let’s create a simple weather tool that the model can use to provide weather information.

from haystack.dataclasses import ChatMessage
from haystack.tools import Tool

# Define a tool that models can call
def weather(city: str):
    """Return mock weather info for the given city."""
    return f"The weather in {city} is sunny and 32°C"

# Define the tool parameters schema
tool_parameters = {
    "type": "object", 
    "properties": {
        "city": {"type": "string"}
    }, 
    "required": ["city"]
}

# Create the weather tool
weather_tool = Tool(
    name="weather",
    description="Useful for getting the weather in a specific city",
    parameters=tool_parameters,
    function=weather,
)

Setting Up Agent

Now, let’s create a LlamaStackChatGenerator and pass it to the Agent. The Agent component will use the model running with LlamaStackChatGenerator to reason and make decisions.

from haystack.components.agents import Agent
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
from haystack.components.generators.utils import print_streaming_chunk

# Create the LlamaStackChatGenerator
chat_generator = LlamaStackChatGenerator(
    model="ollama/llama3.2:3b",  # model name varies depending on the inference provider used for the Llama Stack Server
    api_base_url="http://localhost:8321/v1/openai/v1",
)
# Agent Setup
agent = Agent(
    chat_generator=chat_generator,
    tools=[weather_tool],
)

# Run the Agent
agent.warm_up()

Using Tools with the Agent

Now, when we ask questions, the Agent will utilize both the provided tool and the LlamaStackChatGenerator to generate answers. We enable the streaming in Agent through streaming_callback, so you can observe the tool calls and results in real time.

# Create a message asking about the weather
messages = [ChatMessage.from_user("What's the weather in Tokyo?")]

# Generate a response from the model with access to tools
response = agent.run(messages=messages, tools=[weather_tool],     streaming_callback=print_streaming_chunk,
)

[TOOL CALL]
Tool: weather 
Arguments: {"city":"Tokyo"}

[TOOL RESULT]
The weather in Tokyo is sunny and 32°C

In[ASSISTANT]
 Tokyo, the current weather conditions are mostly sunny with a temperature of 32°C. Would you like to know more about Tokyo's climate or weather forecast for a specific date?

Simple Chat with ChatGenerator

For a simpler use case, you can also create a lightweight mechanism to chat directly with LlamaStackChatGenerator.

messages = []

while True:
  msg = input("Enter your message or Q to exit\n🧑 ")
  if msg=="Q":
    break
  messages.append(ChatMessage.from_user(msg))
  response = chat_generator.run(messages=messages)
  assistant_resp = response['replies'][0]
  print("🤖 "+assistant_resp.text)
  messages.append(assistant_resp)

🤖 The main character in The Witcher series, also known as the eponymous figure, is Geralt of Rivia, a monster hunter with supernatural abilities and mutations that allow him to control the elements. He was created by Polish author_and_polish_video_game_development_company_(CD Projekt).
🤖 One of the most fascinating aspects of dolphin behavior is their ability to produce complex, context-dependent vocalizations that are unique to each individual, similar to human language. They also exhibit advanced social behaviors, such as cooperation, empathy, and self-awareness.

If you want to switch your model provider, you can reuse the same LlamaStackChatGenerator code with different providers. Simply run the desired inference provider on the Llama Stack Server and update the model name during the initialization of LlamaStackChatGenerator.

For more details on available inference providers, see Llama Stack docs.