Pipelines

Module: pipeline

Class: Pipeline

class Pipeline()

Pipeline brings together building blocks to build a complex search pipeline with Haystack & user-defined components.

Under-the-hood, a pipeline is represented as a directed acyclic graph of component nodes. It enables custom query flows with options to branch queries(eg, extractive qa vs keyword match query), merge candidate documents for a Reader from multiple Retrievers, or re-ranking of candidate documents.

add_node

 | add_node(component, name: str, inputs: List[str])

Add a new node to the pipeline.

Arguments:

  • component: The object to be called when the data is passed to the node. It can be a Haystack component (like Retriever, Reader, or Generator) or a user-defined object that implements a run() method to process incoming data from predecessor node.
  • name: The name for the node. It must not contain any dots.
  • inputs: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].

In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output must be specified explicitly as "QueryClassifier.output_2".

get_node

 | get_node(name: str)

Get a node from the Pipeline.

Arguments:

  • name: The name of the node.

set_node

 | set_node(name: str, component)

Set the component for a node in the Pipeline.

Arguments:

  • name: The name of the node.
  • component: The component object to be set at the node.

draw

 | draw(path: Path = Path("pipeline.png"))

Create a Graphviz visualization of the pipeline.

Arguments:

  • path: the path to save the image.

Class: BaseStandardPipeline

class BaseStandardPipeline()

add_node

 | add_node(component, name: str, inputs: List[str])

Add a new node to the pipeline.

Arguments:

  • component: The object to be called when the data is passed to the node. It can be a Haystack component (like Retriever, Reader, or Generator) or a user-defined object that implements a run() method to process incoming data from predecessor node.
  • name: The name for the node. It must not contain any dots.
  • inputs: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].

In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output must be specified explicitly as "QueryClassifier.output_2".

get_node

 | get_node(name: str)

Get a node from the Pipeline.

Arguments:

  • name: The name of the node.

set_node

 | set_node(name: str, component)

Set the component for a node in the Pipeline.

Arguments:

  • name: The name of the node.
  • component: The component object to be set at the node.

draw

 | draw(path: Path = Path("pipeline.png"))

Create a Graphviz visualization of the pipeline.

Arguments:

  • path: the path to save the image.

Class: ExtractiveQAPipeline

class ExtractiveQAPipeline(BaseStandardPipeline)

__init__

 | __init__(reader: BaseReader, retriever: BaseRetriever)

Initialize a Pipeline for Extractive Question Answering.

Arguments:

  • reader: Reader instance
  • retriever: Retriever instance

Class: DocumentSearchPipeline

class DocumentSearchPipeline(BaseStandardPipeline)

__init__

 | __init__(retriever: BaseRetriever)

Initialize a Pipeline for semantic document search.

Arguments:

  • retriever: Retriever instance

Class: GenerativeQAPipeline

class GenerativeQAPipeline(BaseStandardPipeline)

__init__

 | __init__(generator: BaseGenerator, retriever: BaseRetriever)

Initialize a Pipeline for Generative Question Answering.

Arguments:

  • generator: Generator instance
  • retriever: Retriever instance

Class: SearchSummarizationPipeline

class SearchSummarizationPipeline(BaseStandardPipeline)

__init__

 | __init__(summarizer: BaseSummarizer, retriever: BaseRetriever)

Initialize a Pipeline that retrieves documents for a query and then summarizes those documents.

Arguments:

  • summarizer: Summarizer instance
  • retriever: Retriever instance

run

 | run(query: str, filters: Optional[Dict] = None, top_k_retriever: int = 10, generate_single_summary: bool = False, return_in_answer_format=False)

Arguments:

  • query: Your search query
  • filters:
  • top_k_retriever: Number of top docs the retriever should pass to the summarizer. The higher this value, the slower your pipeline.
  • generate_single_summary: Whether to generate single summary from all retrieved docs (True) or one per doc (False).
  • return_in_answer_format: Whether the results should be returned as documents (False) or in the answer format used in other QA pipelines (True). With the latter, you can use this pipeline as a "drop-in replacement" for other QA pipelines.

Class: FAQPipeline

class FAQPipeline(BaseStandardPipeline)

__init__

 | __init__(retriever: BaseRetriever)

Initialize a Pipeline for finding similar FAQs using semantic document search.

Arguments:

  • retriever: Retriever instance

Class: JoinDocuments

class JoinDocuments()

A node to join documents outputted by multiple retriever nodes.

The node allows multiple join modes:

  • concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded.
  • merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different weight & a top_k limit can be set. This mode can also be used for "reranking" retrieved documents.

__init__

 | __init__(join_mode: str = "concatenate", weights: Optional[List[float]] = None, top_k_join: Optional[int] = None)

Arguments:

  • join_mode: concatenate to combine documents from multiple retrievers or merge to aggregate scores of individual documents.
  • weights: A node-wise list(length of list must be equal to the number of input nodes) of weights for adjusting document scores when using the merge joinmode. By default, equal weight is given to each retriever score. This param is not compatible with the concatenate joinmode.
  • top_k_join: Limit documents to top_k based on the resulting scores of the join.
© 2020 - 2021 deepset. All rights reserved.Imprint