There are pure "semantic document search" use cases that do not need question answering functionality but only document ranking. While the Retriever is a perfect fit for document retrieval, we can further improve its results with the Ranker. For example, BM25 (sparse retriever) does not take into account semantics of the documents and the query but only their keywords. The Ranker can re-rank the results of the retriever step by taking semantics into account. Similar to the Reader, it is based on the latest language models. Instead of returning answers, it returns documents in re-ranked order.
Without a Ranker and its re-ranking step, the querying process is faster but the query results might be of lower quality. If you want to do "semantic document search" instead of a question answering, try first with a Retriever only. In case the semantic similarity of the query and the resulting documents is low, add a Ranker.
Note that a Ranker needs to be initialised with a model trained on a text pair classification task. You can train the model also with the train() method of the Ranker. Alternatively, this example shows how to train a text pair classification model in FARM.
The SentenceTransformersRanker consists of a Sentence Transformer based pre-trained Cross-Encoder model for Document Re-ranking (https://huggingface.co/cross-encoder). Re-Ranking can be used on top of a retriever to boost the performance for document search. This is particularly useful if the retriever has a high recall but is bad in sorting the documents by relevance. SentenceTransformerRanker handles Cross-Encoder models that use a single logit as similarity score (https://www.sbert.net/docs/pretrained-models/ce-msmarco.html#usage-with-transformers). This similarity score describes the similarity of the cross-encoded query and document text.
With a SentenceTransformersRanker, you can:
- Directly get predictions (re-ranked version of the supplied list of Document) via predict() if supplying a pre-trained model
from haystack.document_store import ElasticsearchDocumentStorefrom haystack.nodes import ElasticsearchRetriever, SentenceTransformersRankerfrom haystack import Pipelinedocument_store = ElasticsearchDocumentStore()...retriever = ElasticsearchRetriever(document_store)ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")...p = Pipeline()p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])p.add_node(component=ranker, name="Ranker", inputs=["ESRetriever"])