Haystack docs home page

🎉 We are working to make this website better, stay tuned. In the meantime, have a look at our up to date Pipeline Nodes section in the new docs.

Summarizer

The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning.

You can use any summarization model from Hugging Face Transformers by providing the model name. By default, the Google Pegasus model is loaded.

Position in a PipelineAfter preprocessing in an indexing Pipeline or after the Retriever in a querying Pipeline
InputDocuments
OutputDocuments
ClassesTransformersSummarizer

Usage

To initialize and run a stand-alone Summarizer:

from haystack.nodes import TransformersSummarizer
from haystack import Document
docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\
The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\
the shutoffs which were expected to last through at least midday tomorrow.")]
summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
summary = summarizer.predict(documents=docs, generate_single_summary=True)

The contents of summary should contain both the summarization and also the original document text:

[
{
"text": "California's largest electricity provider has turned off power to hundreds of thousands of customers.",
"meta": {
"context": "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions."
},
...
}
]

To use a Summarizer in a pipeline:

from haystack import Pipeline
p = Pipeline()
p.add_node(component=retriever, name="ESRetriever1", inputs=["Query"])
p.add_node(component=summarizer, name="Summarizer", inputs=["ESRetriever1"])
res = p.run(query="What did Einstein work on?")