Name	UpTrainEvaluator
Path	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/uptrain
Most common Position in a Pipeline	On its own or in an Evaluation Pipeline. To be used after a separate Pipeline has generated the inputs for the evaluator.
Mandatory Input variables	“inputs”: A keyword arguments dictionary containing the expected inputs. The expected inputs will change based on what metric you are evaluating. See below for more details
Output variables	“results”: a nested list of metric results. There can be one or more results, depending on the metric. Each result is a dictionary containing: - `name` - The name of the metric. - `score` - The score of the metric. - `explanation` - An optional explanation of the score.

UpTrain is an evaluation framework that provides a number of LLM-based evaluation metrics. You can use the UpTrainEvaluator component to evaluate a Haystack Pipeline, such as a retrieval-augmented generated Pipeline, against one of the metrics provided by UpTrain.

Supported Metrics

UpTrain supports a number of metrics which we expose through the UpTrainMetric enumeration. Below is the list of metrics supported by the UpTrainEvaluator in Haystack with the expected metric_params while initializing the evaluator.

For a complete guide on these metrics, visit the UpTrain documentation.

Metric	Metric Parameters	Expected inputs	Metric description
`CONTEXT_RELEVANCE`	None	`questions:List[str]` `contexts:List[List[str]]`	Grades how relevant the context was to the question specified.
`FACTUAL_ACCURACY`	None	`questions: List[str]` `contexts:List[List[str]]` `responses:List[str]`	Grades how factual the generated response was.
`RESPONSE_RELEVANCE`	None	`questions:List[str]` `responses:List[str]`	Grades how relevant the generated response is or if it has any additional irrelevant information for the question asked.
`RESPONSE_COMPLETENESS`	None	`questions:List[str]` `responses:List[str]`	Grades how complete the generated response was for the question specified.
`RESPONSE_COMPLETENESS_WRT_CONTEXT`	None	`questions:List[str]` `contexts:List[List[str]]` `responses:List[str]`	Grades how complete the generated response was for the question specified given the information provided in the context.
`RESPONSE_CONSISTENCY`	None	`questions:List[str]` `contexts:List[List[str]]` `responses:List[str]`	Grades how consistent the response is with the question asked as well as with the context provided.
`RESPONSE_CONCISENESS`	None	`questions:List[str]` `responses:List[str]`	Grades how concise the generated response is or if it has any additional irrelevant information for the question asked.
`CRITIQUE_LANGUAGE`	None	`responses:List[str]`	Evaluate the response on multiple aspects - fluency, politeness, grammar, and coherence. It provides a score for each of the aspects on a scale of 0 to 1, along with an explanation for the score.
`CRITIQUE_TONE`	`llm_persona`	`responses:List[str]`	Operator to assess the tone of machine generated responses.
`GUIDELINE_ADHERENCE`	`guideline` `guideline_name` `guideline_schema`	`questions:List[str]` `responses:List[str]`	Grades how well the LLM adheres to a provided guideline when giving a response.
`RESPONSE_MATCHING`	`method`	`responses:List[str]` `ground_truths:List[str]`	Operator to compare the LLM-generated text with the gold (ideal) response using the defined score metric.

Parameters Overview

To initialize a UpTrainEvaluator you need to provide the following parameters :

metric: An UpTrainMetric.
metric_params: Optionally, if the metric calls for any additional parameters, you should provide them here.
api: The API you want to use with your evaluator, set to openai by default. Another supported API is uptrain. Check out the UpTrain docs for any changes to supported APIs.
api_key: By default, this component looks for an environment variable called OPENAI_API_KEY. To change this, pass Secret.from_env_var("YOUR_ENV_VAR") to this parameter.

Usage

To use the UpTrainEvaluatoryou first need to install the integration:

pip install uptrain-haystack

To use the UpTrainEvaluator you need to follow these steps:

Initialize the UpTrainEvaluator while providing the correct metric_params for the metric you are using.
Run the UpTrainEvaluator, either on its own or in a Pipeline, by providing the expected input for the metric you are using.

Examples

Evaluate Context Relevance

To create a context relevance evaluation Pipeline:

import os
from haystack import Pipeline
from haystack_integrations.components.evaluators.uptrain import UpTrainEvaluator, UpTrainMetric

os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_KEY'

evaluator = UpTrainEvaluator(
    metric=UpTrainMetric.CONTEXT_RELEVANCE,
    api="openai",
)

evaluator_pipeline = Pipeline()
evaluator_pipeline.add_component("evaluator", evaluator)

To run the evaluation Pipeline, you should have the expected inputs for the metric ready at hand. This metric expects a list of questions and contexts, these should come from the results of the Pipeline you want to evaluate.

results = evaluator_pipeline.run({"evaluator": {"questions": ["When was the Rhodes Statue built?", "Where is the Pyramid of Giza?"], 
                                                "contexts": ["[["Context for question 1", "Context 1"], ["Context for question 2"]}})
2"]]}})

Critique Tone

To create an evaluation Pipeline that critiques tone which critiques whether the tone of the response is “informative”:

import os
from haystack import Pipeline
from haystack_integrations.components.evaluators.uptrain import UpTrainEvaluator, UpTrainMetric

os.eviron['OPENAI_API_KEY'] = 'YOUR_OPENAI_KEY'

evaluator = UpTrainEvaluator(
    metric=UpTrainMetric.CRITIQUE_TONE,
    api="openai",
    metric_params={"llm_persona": "informative"}
)

evaluator_pipeline = Pipeline()
evaluator_pipeline.add_component("evaluator", evaluator)

To run this evaluation Pipeline, you should have the expected inputs for the metric ready at hand. This metric expects a list of responses which should come from the results of the Pipeline you want to evaluate.

evaluation_results = evaluator_pipeline.run({"evaluator": {"responses": ["The Rhodes Statue was built in 280 BC."]}})