DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

JinaRanker

Use this component to rank Documents based on their similarity to the query using Jina AI models.

NameJinaRanker
Pathhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina
Most common Position in a PipelineIn a query pipeline, after a component that returns a list of documents (such as a Retriever).
Mandatory Input variablesβ€œquery”: A query string

”documents”: A list of Document objects
Output variablesβ€œdocuments”: A list of Document objects

Overview

JinaRanker ranks the given documents based on how similar they are to the given query. It uses Jina AI ranking models – check out the full list at Jina AI’s website. The default model for this Ranker is jina-reranker-v1-base-en.

Additionally, you can use the optional top_k and score_threshold parameters with JinaRanker :

  • The Ranker'sΒ top_kΒ is the number of documents it returns (if it's the last component in the pipeline) or forwards to the next component.
  • If you set the score_threshold for the Ranker, it will only return documents with a similarity score (computed by the Jina AI model) above this threshold.

Installation

To start using this integration with Haystack, install the package with:

pip install jina-haystack

Authorization

The component uses aΒ JINA_API_KEYΒ environment variable by default. Otherwise, you can pass a Jina API key at initialization withΒ api_keyΒ like this:

ranker = JinaRanker(api_key=Secret.from_token("<your-api-key>"))

To get your API key, head to Jina AI’s website.

Usage

On its own

You can use JinaRankerΒ outside of a pipeline to order documents based on your query.

To run the Ranker, pass a query, provide the documents, and set the number of documents to return in theΒ top_kΒ parameter.

from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker

docs = [Document(content="Paris"), Document(content="Berlin")]

ranker = JinaRanker()

ranker.run(query="City in France", documents=docs, top_k=1)

In a Pipeline

This is an example of a pipeline that retrieves documents from an InMemoryDocumentStore based on keyword search (using InMemoryBM25Retriever). It then uses the JinaRanker to rank the retrieved documents according to their similarity to the query.

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack_integrations.components.rankers.jina import JinaRanker

docs = [Document(content="Paris is in France"), 
        Document(content="Berlin is in Germany"),
        Document(content="Lyon is in France")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = JinaRanker()

ranker_pipeline = Pipeline()
ranker_pipeline.add_component(instance=retriever, name="retriever")
ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Cities in France"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
                                   "ranker": {"query": query, "top_k": 2}})

Related Links

Check out the API reference in the GitHub repo or in our docs: