Haystack docs home page

🎉 We are working to make this website better, stay tuned. In the meantime, have a look at our up to date API Reference section in the new docs.

Module docs2answers

Docs2Answers

class Docs2Answers(BaseComponent)

This Node is used to convert retrieved documents into predicted answers format.

It is useful for situations where you are calling a Retriever only pipeline via REST API. This ensures that your output is in a compatible format.

Arguments:

  • progress_bar: Whether to show a progress bar

Module join_docs

JoinDocuments

class JoinDocuments(JoinNode)

A node to join documents outputted by multiple retriever nodes.

The node allows multiple join modes:

  • concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded. The score is only determined by the last node that outputs the document.
  • merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different weight & a top_k limit can be set. This mode can also be used for "reranking" retrieved documents.
  • reciprocal_rank_fusion: combines the documents based on their rank in multiple nodes.

JoinDocuments.__init__

def __init__(join_mode: str = "concatenate",
             weights: Optional[List[float]] = None,
             top_k_join: Optional[int] = None,
             sort_by_score: bool = True)

Arguments:

  • join_mode: concatenate to combine documents from multiple retrievers merge to aggregate scores of individual documents, reciprocal_rank_fusion to apply rank based scoring.
  • weights: A node-wise list(length of list must be equal to the number of input nodes) of weights for adjusting document scores when using the merge join_mode. By default, equal weight is given to each retriever score. This param is not compatible with the concatenate join_mode.
  • top_k_join: Limit documents to top_k based on the resulting scores of the join.
  • sort_by_score: Whether to sort the incoming documents by their score. Set this to True if all your Documents are coming with score values. Set to False if any of the Documents come from sources where the score is set to None, like TfidfRetriever on Elasticsearch.

Module join_answers

JoinAnswers

class JoinAnswers(JoinNode)

A node to join Answers produced by multiple Reader nodes.

JoinAnswers.__init__

def __init__(join_mode: str = "concatenate",
             weights: Optional[List[float]] = None,
             top_k_join: Optional[int] = None,
             sort_by_score: bool = True)

Arguments:

  • join_mode: "concatenate" to combine documents from multiple Readers. "merge" to aggregate scores of individual Answers.
  • weights: A node-wise list (length of list must be equal to the number of input nodes) of weights for adjusting Answer scores when using the "merge" join_mode. By default, equal weight is assigned to each Reader score. This parameter is not compatible with the "concatenate" join_mode.
  • top_k_join: Limit Answers to top_k based on the resulting scored of the join.
  • sort_by_score: Whether to sort the incoming answers by their score. Set this to True if your Answers are coming from a Reader or TableReader. Set to False if any Answers come from a Generator since this assigns None as a score to each.

Module route_documents

RouteDocuments

class RouteDocuments(BaseComponent)

A node to split a list of Documents by content_type or by the values of a metadata field and route them to different nodes.

RouteDocuments.__init__

def __init__(split_by: str = "content_type",
             metadata_values: Optional[List[str]] = None)

Arguments:

  • split_by: Field to split the documents by, either "content_type" or a metadata field name. If this parameter is set to "content_type", the list of Documents will be split into a list containing only Documents of type "text" (will be routed to "output_1") and a list containing only Documents of type "table" (will be routed to "output_2"). If this parameter is set to a metadata field name, you need to specify the parameter metadata_values as well.
  • metadata_values: If the parameter split_by is set to a metadata field name, you need to provide a list of values to group the Documents to. Documents whose metadata field is equal to the first value of the provided list will be routed to "output_1", Documents whose metadata field is equal to the second value of the provided list will be routed to "output_2", etc.