Haystack docs home page

🎉 We are working to make this website better, stay tuned. In the meantime, have a look at our up to date Pipeline Nodes section in the new docs.


The Reader takes a question and a set of Documents as input and returns an Answer by selecting a text span within the Documents. The Reader is also known as an Open-Domain QA system in Machine Learning speak.

Position in a PipelineGenerally after a Retriever


  • Built on the latest transformer-based language models
  • Strong in their grasp of semantics
  • Sensitive to syntactic structure
  • State-of-the-art in QA tasks like SQuAD and Natural Questions

Haystack Readers contain all the components of end-to-end, open-domain QA systems, including:

  • Loading of model weights
  • Tokenization
  • Embedding computation
  • Span prediction
  • Candidate aggregation


  • Requires a GPU to run quickly


To initialize a reader, run:

To run a reader on its own, use the predict() method:

result = reader.predict(
query="Which country is Canberra located in?",

This will return a dictionary of the following format:

'query': 'Which country is Canberra located in?',
{'answer': 'Australia',
'context': "Canberra, federal capital of the Commonwealth of Australia. It occupies part of the Australian Capital Territory (ACT),",
'offset_answer_start': 147,
'offset_answer_end': 154,
'score': 0.9787139466668613,
'document_id': '1337'

If you want to set up Haystack as a service, use the Reader in a pipeline:

from haystack.pipelines import ExtractiveQAPipeline
pipe = ExtractiveQAPipeline(reader, retriever)
prediction = pipe.run(
query='Which country is Canberra located in?',
params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 10}}


With the TableReader, you can get answers to your questions even if the answer is buried in a table. It is designed to use the TAPAS model created by Google.

These models are able to return a single cell as an answer or pick a set of cells and then perform an aggregation operation to form a final answer. To find out more, have a look at our guide on Table Question Answering.


The different versions of Reader models are referred to as models. Different models have different strengths and weaknesses. Larger models are generally more accurate but sacrifice some speed. Models trained on different data may be more suited to certain domains. For more information about models, see Language Models.

You will find many open source Reader models on the HuggingFace Model Hub. Haystack automatically handles the downloading and loading of the model if you provide the Model Hub name to the Reader's initialization.

Compatible models

The Reader supports extractive question answering models that have a BERT-based architecture such as:

  • BERT
  • RoBERTa
  • MiniLM
  • XLM
  • DistilBERT
  • DeBERTa

If you're using a sequence to sequence model such as BART, use the Answer Generator instead.

If you're unsure which Reader model to use, here are our recommendations that you can start with:

Fine-tuning, Saving, Loading, and Converting

In Haystack, it is possible to fine-tune your FARMReader model on any SQuAD format QA dataset. To kick off training, call the train() method. This method also saves your model in the specified directory.


If you want to load the model at a later point, initialize a FARMReader object as follows:

new_reader = FARMReader(model_name_or_path="my_model")

To convert your model from or into the Hugging Face Transformers format, use a conversion function. Calling reader.inferencer.model.convert_to_transformers() returns a list of Hugging Face models. This can be particularly useful if you want to upload the model to the Hugging Face Model Hub.

transformers_models = reader.inferencer.model.convert_to_transformers()

Instead of defining a fixed number of training epochs, you can train the model using a method called early stopping. This method performs cycles of training and evaluation until the model is no longer improving. To use this approach, run:

from haystack.nodes import FARMReader
from haystack.utils.early_stopping import EarlyStopping
early_stopping = EarlyStopping(
reader = FARMReader(model_name_or_path='deepset/roberta-base-squad2', use_gpu=True)

You can find more details about the EarlyStopping class in the EarlyStopping API documentation.

Tutorial: For a hands-on example, check out our tutorial on fine-tuning.

Confidence Scores

When printing the full results of a Reader, each prediction is accompanied by a value in the range of 0 to 1 reflecting the model's confidence in that prediction.

In the output of print_answers(), you can find the model's confidence score in a dictionary key called score.

from haystack.utils import print_answers
print_answers(prediction, details="all")
'answers': [
{ 'answer': 'Eddard',
'context': 's Nymeria after a legendary warrior queen. '
'She travels with her father, Eddard, to '
"King's Landing when he is made Hand of the "
'King. Before she leaves,',
'score': 0.9899835586547852,

The intuition behind this score is the following: if a model has on average a confidence score of 0.9, that means we can expect the model's predictions to be correct in about 9 out of 10 cases. However, if the model's training data strongly differs from the data it needs to make predictions on, we cannot guarantee that the confidence score and the model's accuracy are well aligned. In order to better align this confidence score with the model's accuracy, you should fine-tune your model on a specific dataset. The Reader has a method calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin) that you can use. The parameters of this method are the same as for the eval() method because the calibration of confidence scores is performed on a dataset that comes with gold labels. The calibration calls the eval() method internally and therefore needs a DocumentStore containing labeled questions and evaluation documents.

Have a look at this FARM tutorial to see how to compare calibrated confidence scores with uncalibrated confidence scores within FARM. Note that a fine-tuned confidence score is specific to the domain that it is fine-tuned on. There is no guarantee that this performance can transfer to a new domain.

Having a confidence score is particularly useful in cases where you need Haystack to work with a certain accuracy threshold. Many of our users have built systems where predictions below a certain confidence value are routed on to a fallback system.

Deeper Dive: FARM vs Transformers

Apart from the model weights, Haystack Readers contain all the components found in end-to-end open domain QA systems. This includes tokenization, embedding computation, span prediction and candidate aggregation. FARM and Transformers libraries handle weights in the same way but their QA pipelines differ in some ways. The major points are:

  • The TransformersReader sometimes predicts the same span twice while the FARMReader removes duplicates.

  • The FARMReader currently uses the tokenizers from the Hugging Face Transformers library while the TransformersReader uses the tokenizers from the Hugging Face Tokenizers.

  • Start and end logits are normalized per passage and multiplied in the TransformersReader while they are summed and not normalised in the FARMReader.

If you’re interested in the finer details of these points, have a look at this Github comment.

We see value in maintaining both kinds of Readers since Transformers is a very familiar library to many of Haystack’s users but we at deepset can more easily update and optimise the FARM pipeline for speed and performance.

Haystack also has a close integration with FARM which means that you can further fine-tune your Readers on labelled data using a FARMReader. See our tutorials for an end-to-end example or below for a shortened example.

from haystack.nodes import FARMReader
# Initialise Reader
model = "deepset/roberta-base-squad2"
reader = FARMReader(model)
# Perform fine-tuning
train_data = "PATH/TO_YOUR/TRAIN_DATA"
train_filename = "train.json"
save_dir = "finetuned_model"
reader.train(train_data, train_filename, save_dir=save_dir)
# Load
finetuned_reader = FARMReader(save_dir)

Deeper Dive: From Language Model to Haystack Reader

Language models form the core of most modern NLP systems and that includes the Readers in Haystack. They build a general understanding of language when performing training tasks such as Masked Language Modeling or Replaced Token Detection on large amounts of text. Well-trained language models capture the word distribution in one or more languages but more importantly, convert input text into a set of word vectors that capture elements of syntax and semantics.

To convert a language model into a Reader model, you must first train it on a Question Answering dataset. To do so, you must add a question answering prediction head on top of the language model. You can think of it as a token classification task where every input token is assigned a probability of being the start or end token of the correct answer. If a passage doesn't contain the answer, the prediction head should return a no_answer prediction.

Because the number of tokens that language models can process in a single forward pass is limited, we implemented a sliding window mechanism to handle documents of variable length. This mechanism slices the document into overlapping passages of approximately max_seq_len. Each passage is offset by a doc_stride number of tokens. You can set these parameters when initializing the Reader:

Predictions are made on each individual passage and the process of aggregation picks the best candidates across all passages. To learn about what is happening behind the scenes, have a look at Modern Question Answering Systems Explained.