Name	AzureOpenAIChatGenerator
Folder Path	/generators/chat
Most common Position in a Pipeline	After a `PromptBuilder`
Mandatory Input variables	“messages”: a list of ChatMessage objects representing the chat
Output variables	“replies”: a list of alternative replies of the LLM to the input chat

Overview

AzureOpenAIChatGenerator supports OpenAI models deployed through Azure services. To see the list of supported models, head over to Azure documentation. The default model used with the component is gpt-35-turbo.

To work with Azure components, you will need an Azure OpenAI API key, as well as an Azure OpenAI Endpoint. You can learn more about them in Azure documentation.

The component uses AZURE_OPENAI_API_KEY and AZURE_OPENAI_AD_TOKEN environment variables by default. Otherwise, you can pass api_key and azure_ad_token at initialization:

client = AzureOpenAIChatGenerator(azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
                        api_key=Secret.from_token("<your-api-key>"),
                        azure_deployment="<a model name>")

📘
We recommend using environment variables instead of initialization parameters.

Then, the component needs a list of ChatMessage objects to operate. ChatMessage is a data class that contains a message, a role (who generated the message, such as user, assistant, system, function), and optional metadata. See the usage section for an example.

You can pass any chat completion parameters that are valid for the openai.ChatCompletion.create method directly to AzureOpenAIChatGenerator using the generation_kwargs parameter, both at initialization and to run() method. For more details on the supported parameters, refer to the Azure documentation.

You can also specify a model for this component through the azure_deployment init parameter.

Streaming

AzureOpenAIChatGenerator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback init parameter. Note that streaming the tokens is only compatible with generating a single response, so n must be set to 1 for streaming to work.

📘
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use OpenAI LLMs for text generation (such as translation or summarization tasks) or don’t want to use the ChatMessage object, use AzureOpenAIGenerator instead.

Usage

On its own

Basic usage:

from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import AzureOpenAIChatGenerator
client = AzureOpenAIChatGenerator()
response = client.run(
	  [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
)
print(response)

>>> {'replies': [ChatMessage(content='Natural Language Processing (NLP) is a
subfield of artificial intelligence (AI) that focuses on the interaction
between computers and humans through natural language. It involves enabling
computers to understand, interpret, and generate human language, enabling
various applications such as translation, sentiment analysis, chatbots, and
voice assistants.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None,
metadata={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason':
'stop', 'usage': {'prompt_tokens': 16, 'completion_tokens': 61,
'total_tokens': 77}})]}

With streaming:

from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import AzureOpenAIChatGenerator
client = AzureOpenAIChatGenerator(streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))
response = client.run(
	  [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
)
print(response)

>>> Natural Language Processing (NLP) is a
subfield of artificial intelligence (AI) that focuses on the interaction
between computers and humans through natural language. It involves enabling
computers to understand, interpret, and generate human language, enabling
various applications such as translation, sentiment analysis, chatbots, and
voice assistants.
>>> {'replies': [ChatMessage(content='Natural Language Processing (NLP) is a
subfield of artificial intelligence (AI) that focuses on the interaction
between computers and humans through natural language. It involves enabling
computers to understand, interpret, and generate human language, enabling
various applications such as translation, sentiment analysis, chatbots, and
voice assistants.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None,
metadata={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason':
'stop', 'usage': {'prompt_tokens': 16, 'completion_tokens': 61,
'total_tokens': 77}})]}

In a Pipeline

from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import AzureOpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

# no parameter init, we don't use any runtime template variables
prompt_builder = DynamicChatPromptBuilder()
llm = AzureOpenAIChatGenerator()

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
            ChatMessage.from_user("Tell me about {{location}}")]
pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "prompt_source": messages}})

>> {'llm': {'replies': [ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und die größte Stadt des Landes.
>> Es ist eine lebhafte Metropole, die für ihre Geschichte, Kultur und einzigartigen Sehenswürdigkeiten bekannt ist.
>> Berlin bietet eine vielfältige Kulturszene, beeindruckende architektonische Meisterwerke wie den Berliner Dom
>> und das Brandenburger Tor, sowie weltberühmte Museen wie das Pergamonmuseum. Die Stadt hat auch eine pulsierende
>> Clubszene und ist für ihr aufregendes Nachtleben berühmt. Berlin ist ein Schmelztiegel verschiedener Kulturen und
>> zieht jedes Jahr Millionen von Touristen an.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None,
>> metadata={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 32,
>> 'completion_tokens': 153, 'total_tokens': 185}})]}}