Calculating a Hallucination Score with the OpenAIChatGenerator

_{Last Updated:
September 4, 2025}

In this cookbook we will show how to calculate a hallucination risk based on the research paper LLMs are Bayesian, in Expectation, not in Realization and this GitHub repo, https://github.com/leochlon/hallbayes.

In this notebook, we’ll use the OpenAIChatGenerator from haystack-experimental.

Setup Environment

%pip install haystack-experimental -q

Set up OpenAI API Key

import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

Enter OpenAI API key: ········

Closed Book Example

Based on the example from the original GitHub repo here

from haystack.dataclasses import ChatMessage

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

closed_book_result = llm.run(
    messages=[ChatMessage.from_user(text="Who won the 2019 Nobel Prize in Physics?")],
    hallucination_score_config=HallucinationScoreConfig(
        skeleton_policy="closed_book" # NOTE: We set "closed_book" here for closed-book hallucination risk calculation
    ),
)
print(f"Decision: {closed_book_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {closed_book_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {closed_book_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{closed_book_result['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.000
Rationale: Δ̄=8.2088 nats, B2T=1.8947, ISR=4.332 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.000; y='answer'
Answer:
The 2019 Nobel Prize in Physics was awarded to three scientists for their contributions to understanding the universe. Half of the prize went to James Peebles for his theoretical discoveries in physical cosmology. The other half was jointly awarded to Michel Mayor and Didier Queloz for their discovery of an exoplanet orbiting a solar-type star.

Evidence-based Example

Based on the example from the original GitHub repo here

from haystack.dataclasses import ChatMessage

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

rag_result = llm.run(
    messages=[
        ChatMessage.from_user(
            text="Task: Answer strictly based on the evidence provided below.\n"
            "Question: Who won the Nobel Prize in Physics in 2019?\n"
            "Evidence:\n"
            "- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
            "Constraints: If evidence is insufficient or conflicting, refuse."
        )
    ],
    hallucination_score_config=HallucinationScoreConfig(
        skeleton_policy="evidence_erase"  # NOTE: We set "evidence_erase" here for evidence-based hallucination risk calculation
    ),
)
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{rag_result['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles, who received half of the prize, and to Michel Mayor and Didier Queloz, who shared the other half of the prize.

RAG-based Example

Create a Document Store and index some documents

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

docs = [
    Document(content="Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2)"),
    Document(content="Nikola Tesla was a Serbian-American engineer, futurist, and inventor. He is known for his contributions to the design of the modern alternating current (AC) electricity supply system.")
]
document_store.write_documents(docs)

Create a RAG Question Answering pipeline

from haystack import Pipeline
from haystack.dataclasses import ChatMessage
from haystack.components.builders import ChatPromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

# Create the pipeline
pipe = Pipeline()

# Add components
user_template = """Task: Answer strictly based on the evidence provided below.
Question: {{query}}
Evidence:
{%- for document in documents %}
- {{document.content}}
{%- endfor -%}
Constraints: If evidence is insufficient or conflicting, refuse.
"""
pipe.add_component("retriever", InMemoryBM25Retriever(document_store))
pipe.add_component(
    "prompt_builder",
    ChatPromptBuilder(template=[ChatMessage.from_user(user_template)], required_variables="*")
)
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-4o"))

# Connect the components
pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x1426632e0>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

Run a query that is answerable based on the evidence

query = "Who won the Nobel Prize in Physics in 2019?"

result = pipe.run(
    data={
        "retriever": {"query": query},
        "prompt_builder": {"query": query},
        "llm": {
            "hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
        }
    }
)

print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles (1/2), and Michel Mayor & Didier Queloz (1/2).

Run a query that should not be answered

query = "Who won the Nobel Prize in Physics in 2022?"

result = pipe.run(
    data={
        "retriever": {"query": query},
        "prompt_builder": {"query": query},
        "llm": {
            "hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
        }
    }
)

print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")

Decision: REFUSE
Risk bound: 1.000
Rationale: Δ̄=0.0000 nats, B2T=1.8947, ISR=0.000 (thr=1.000), extra_bits=0.200; EDFL RoH bound=1.000; y='refuse'
Answer:
The evidence provided does not include information about the Nobel Prize in Physics for the year 2022. Therefore, I cannot answer the question based on the evidence provided.