Calculating a Hallucination Score with the OpenAIChatGenerator
Last Updated: September 4, 2025
In this cookbook we will show how to calculate a hallucination risk based on the research paper LLMs are Bayesian, in Expectation, not in Realization and this GitHub repo, https://github.com/leochlon/hallbayes.
In this notebook, we’ll use the OpenAIChatGenerator from haystack-experimental.
Setup Environment
%pip install haystack-experimental -q
Set up OpenAI API Key
import os
from getpass import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
Enter OpenAI API key: ········
Closed Book Example
Based on the example from the original GitHub repo here
from haystack.dataclasses import ChatMessage
from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator
llm = OpenAIChatGenerator(model="gpt-4o")
closed_book_result = llm.run(
messages=[ChatMessage.from_user(text="Who won the 2019 Nobel Prize in Physics?")],
hallucination_score_config=HallucinationScoreConfig(
skeleton_policy="closed_book" # NOTE: We set "closed_book" here for closed-book hallucination risk calculation
),
)
print(f"Decision: {closed_book_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {closed_book_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {closed_book_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{closed_book_result['replies'][0].text}")
Decision: ANSWER
Risk bound: 0.000
Rationale: Δ̄=8.2088 nats, B2T=1.8947, ISR=4.332 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.000; y='answer'
Answer:
The 2019 Nobel Prize in Physics was awarded to three scientists for their contributions to understanding the universe. Half of the prize went to James Peebles for his theoretical discoveries in physical cosmology. The other half was jointly awarded to Michel Mayor and Didier Queloz for their discovery of an exoplanet orbiting a solar-type star.
Evidence-based Example
Based on the example from the original GitHub repo here
from haystack.dataclasses import ChatMessage
from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator
llm = OpenAIChatGenerator(model="gpt-4o")
rag_result = llm.run(
messages=[
ChatMessage.from_user(
text="Task: Answer strictly based on the evidence provided below.\n"
"Question: Who won the Nobel Prize in Physics in 2019?\n"
"Evidence:\n"
"- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
"Constraints: If evidence is insufficient or conflicting, refuse."
)
],
hallucination_score_config=HallucinationScoreConfig(
skeleton_policy="evidence_erase" # NOTE: We set "evidence_erase" here for evidence-based hallucination risk calculation
),
)
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{rag_result['replies'][0].text}")
Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles, who received half of the prize, and to Michel Mayor and Didier Queloz, who shared the other half of the prize.
RAG-based Example
Create a Document Store and index some documents
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
docs = [
Document(content="Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2)"),
Document(content="Nikola Tesla was a Serbian-American engineer, futurist, and inventor. He is known for his contributions to the design of the modern alternating current (AC) electricity supply system.")
]
document_store.write_documents(docs)
2
Create a RAG Question Answering pipeline
from haystack import Pipeline
from haystack.dataclasses import ChatMessage
from haystack.components.builders import ChatPromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator
# Create the pipeline
pipe = Pipeline()
# Add components
user_template = """Task: Answer strictly based on the evidence provided below.
Question: {{query}}
Evidence:
{%- for document in documents %}
- {{document.content}}
{%- endfor -%}
Constraints: If evidence is insufficient or conflicting, refuse.
"""
pipe.add_component("retriever", InMemoryBM25Retriever(document_store))
pipe.add_component(
"prompt_builder",
ChatPromptBuilder(template=[ChatMessage.from_user(user_template)], required_variables="*")
)
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-4o"))
# Connect the components
pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.messages")
<haystack.core.pipeline.pipeline.Pipeline object at 0x1426632e0>
🚅 Components
- retriever: InMemoryBM25Retriever
- prompt_builder: ChatPromptBuilder
- llm: OpenAIChatGenerator
🛤️ Connections
- retriever.documents -> prompt_builder.documents (list[Document])
- prompt_builder.prompt -> llm.messages (list[ChatMessage])
Run a query that is answerable based on the evidence
query = "Who won the Nobel Prize in Physics in 2019?"
result = pipe.run(
data={
"retriever": {"query": query},
"prompt_builder": {"query": query},
"llm": {
"hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
}
}
)
print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")
Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles (1/2), and Michel Mayor & Didier Queloz (1/2).
Run a query that should not be answered
query = "Who won the Nobel Prize in Physics in 2022?"
result = pipe.run(
data={
"retriever": {"query": query},
"prompt_builder": {"query": query},
"llm": {
"hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
}
}
)
print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")
Decision: REFUSE
Risk bound: 1.000
Rationale: Δ̄=0.0000 nats, B2T=1.8947, ISR=0.000 (thr=1.000), extra_bits=0.200; EDFL RoH bound=1.000; y='refuse'
Answer:
The evidence provided does not include information about the Nobel Prize in Physics for the year 2022. Therefore, I cannot answer the question based on the evidence provided.