A Ranker reorders a set of Documents based on their relevance to the Query. It is particularly useful when your Retriever has high recall but poor relevance scoring. The improvement that the Ranker brings comes at the cost of some additional computation time. The ranking models supported by Haystack are models powered by transformers, meaning that they are sensitive to word order and syntax.
|Position in a Pipeline||After a Retriever|
To use the Ranker in a pipeline:
from haystack.document_stores import ElasticsearchDocumentStorefrom haystack.nodes import ElasticsearchRetriever, SentenceTransformersRankerfrom haystack import Pipelinedocument_store = ElasticsearchDocumentStore()...retriever = ElasticsearchRetriever(document_store)ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")...p = Pipeline()p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])p.add_node(component=ranker, name="Ranker", inputs=["ESRetriever"])
SentenceTransformersRanker can also be used in isolation by calling its
predict() method after initialization.
As an example, a Ranker can pair nicely with a sparse BM25 retriever such as the ElasticsearchRetriever. While the BM25 retriever is fast and lightweight, it is not sensitive to word order but rather treats text as a bag of words. By placing a Ranker afterwards, you can offset this weakness and have a better sorted list of relevant documents.
The Ranker needs to be initialised with a model trained on a text pair classification task.
SentenceTransformersRanker has a
train() method to allow for this training.
Alternatively, this FARM script shows how to train a text pair classification model.