Tutorial: Generating Structured Output with OpenAI
Last Updated: December 29, 2025
- Level: Beginner
- Time to complete: 15 minutes
- Prerequisites: You must have an API key from an active OpenAI account as this tutorial uses a GPT model by OpenAI.
- Components Used:
OpenAIChatGenerator,OpenAIResponsesChatGenerator - Goal: Learn how to generate structured outputs with
OpenAIChatGeneratororOpenAIResponsesChatGeneratorusing Pydantic model or JSON schema.
Overview
This tutorial shows how to produce structured outputs by either providing
Pydantic model or JSON schema to OpenAIChatGenerator.
Note: Only latest model starting with gpt-4o-mini can be used for this feature.
Installing Dependencies
Install Haystack with pip:
%%bash
pip install -q "haystack-ai>=2.20.0"
Structured Outputs with OpenAIChatGenerator
Using Pydantic Models
First, we’ll see how to pass Pydantic model to OpenAIChatGenerator. For this purpose, we define two
Pydantic models, City and CitiesData. These models specify the fields and types that represent the data structure we want.
from typing import List
from pydantic import BaseModel
class City(BaseModel):
name: str
country: str
population: int
class CitiesData(BaseModel):
cities: List[City]
You can change these models according to the format you wish to extract from the text.
OpenAIChatGenerator generates
text using OpenAI’s GPT model by default. We pass our Pydantic model to response_format parameter in generation_kwargs .
We also need to set the OPENAI_API_KEY variable.
Note: You can also set the response_format in generation_kwargs param in the run method of chat generator.
import os
from getpass import getpass
from haystack.components.generators.chat import OpenAIChatGenerator
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
chat_generator = OpenAIChatGenerator(generation_kwargs={"response_format": CitiesData})
Running the Component
Run the component with an example passage that you want to convert into a JSON format and the json_schema you have created for CitiesData. For the given example passage, the generated JSON object should be like:
{
"cities": [
{
"name": "Berlin",
"country": "Germany",
"population": 3850809
},
{
"name": "Paris",
"country": "France",
"population": 2161000
},
{
"name": "Lisbon",
"country": "Portugal",
"population": 504718
}
]
}
The output of the LLM should be compliant with the json_schema.
from haystack.dataclasses import ChatMessage
text = "Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents. Lisbon is the capital and the largest city of Portugal with the population of 504,718."
result = chat_generator.run(messages=[ChatMessage.from_user(text)])
Printing the Correct JSON
If you didn’t get any error, you can now print the corrected JSON.
import json
valid_reply = result["replies"][0].text
valid_json = json.loads(valid_reply)
print(valid_json)
Using JSON schema
Now, weโll create a JSON schema of the CitiesData model and pass it to OpenAIChatGenerator. OpenAI expects schemas in a specific format, so the schema generated with model_json_schema() cannot be used directly.
For details on how to create schemas for OpenAI, see the OpenAI Structured Outputs guide.
cities_data_schema={
"type": "json_schema",
"json_schema": {
"name": "CitiesData",
"schema": {
"type": "object",
"properties": {
"cities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"country": { "type": "string" },
"population": { "type": "integer" }
},
"required": ["name", "country", "population"],
"additionalProperties": False
}
}
},
"required": ["cities"],
"additionalProperties": False
},
"strict": True
}
}
Pass this JSON schema to the response_format parameter in chat generator. We run the generator individually to see the output.
chat_generator = OpenAIChatGenerator(generation_kwargs={"response_format": cities_data_schema})
text = "Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents. Lisbon is the capital and the largest city of Portugal with the population of 504,718."
result = chat_generator.run(messages=[ChatMessage.from_user(text)])
print(result["replies"][0].text)
Structured Outputs with OpenAIResponsesChatGenerator
Using Pydantic Models
We’ll use the models City and CitiesData defined above.
OpenAIResponsesChatGenerator generates
text using OpenAI’s gpt-5-mini model by default. We pass our Pydantic model to text_format parameter in generation_kwargs when calling the run method.
Note: You can set the text_format for the generator by passing it in generation_kwargs, in init or run methods.
import os
from getpass import getpass
from haystack.components.generators.chat import OpenAIResponsesChatGenerator
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
responses_generator = OpenAIResponsesChatGenerator(generation_kwargs={"text_format": CitiesData})
Let’s check the structured output with a simple user message.
responses_generator.run(messages=[ChatMessage.from_user("Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents.")])
Using JSON Schema
Now, weโll create a JSON schema of the CitiesData model and pass it to OpenAIResponsesChatGenerator. We cannot use the same schema we defined for OpenAIChatGenerator as OpenAI Responses API expects a different format of schema.
For further details, see the
documentation.
cities_data_schema_responses={
"format": {
"type": "json_schema",
"name": "CitiesData",
"schema": {
"type": "object",
"properties": {
"cities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"country": { "type": "string" },
"population": { "type": "integer" }
},
"required": ["name", "country", "population"],
"additionalProperties": False
}
}
},
"required": ["cities"],
"additionalProperties": False
},
"strict": True
}
}
We pass our JSON schema to text parameter in generation_kwargs.
Note: You can also set the text in generation_kwargs param in the run method of the chat generator.
chat_generator = OpenAIResponsesChatGenerator(generation_kwargs={"text": cities_data_schema_responses})
result = chat_generator.run(messages=[ChatMessage.from_user("Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents.")])
parsed = json.loads(result["replies"][0].text)
print(parsed)
What’s next
๐ Congratulations! You’ve learned how to easily produce structured ouputs with OpenAIChatGenerator and OpenAIResponsesChatGenerator using Pydantic models and JSON schema.
Other chat generators that also support structured outputs:
MistralChatGenerator,
OpenRouterChatGenerator,
NvidiaChatGenerator,
MetaLlamaChatGenerator,
TogetherAIChatGenerator,
LlamaStackChatGenerator and
STACKITChatGenerator.
To stay up to date on the latest Haystack developments, you can subscribe to our newsletter and join Haystack discord community.
Thanks for reading!
