Tutorial: Creating a Generative QA Pipeline with PromptNode


  • Level: Advanced
  • Time to complete: 15 minutes
  • Nodes Used: InMemoryDocumentStore, BM25Retriever, PromptNode, PromptTemplate Shaper
  • Goal: After completing this tutorial, you’ll have created a generative question answering search system that uses a large language model through PromptNode with the help of Shaper.

Overview

Learn how to build a generative question answering pipeline using the power of LLMs with PromptNode. In this tutorial, we’ll use the Wikipedia pages of Seven Wonders of the Ancient World as Documents, but you can replace them with any text you want.

This tutorial introduces you to the new Shaper node and explains how to use Shaper to integrate PromptNode in the pipeline.

Preparing the Colab Environment

Installing Haystack

To start, let’s install the latest release of Haystack with pip:

%%bash

pip install --upgrade pip
pip install farm-haystack[colab]
pip install datasets>=2.6.1

Enabling Telemetry

Knowing you’re using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See Telemetry for more details.

from haystack.telemetry import tutorial_running

tutorial_running(22)

Initializing the DocumentStore

We’ll start creating our question answering system by initializing a DocumentStore. A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, we’re using the InMemoryDocumentStore.

Let’s initialize our DocumentStore.

from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)

InMemoryDocumentStore is the simplest DocumentStore to get started with. It requires no external dependencies and it’s a good option for smaller projects and debugging. But it doesn’t scale up so well to larger Document collections, so it’s not a good choice for production systems. To learn more about the DocumentStore and the different types of external databases that we support, see DocumentStore.

The DocumentStore is now ready. Now it’s time to fill it with some Documents.

Fetching and Writing Documents

We’ll use the Wikipedia pages of Seven Wonders of the Ancient World as Documents. We preprocessed the data and uploaded to a Hugging Face Space: Seven Wonders. Thus, we don’t need to perform any additional cleaning or splitting.

Let’s fetch the data and write it to the DocumentStore:

from datasets import load_dataset

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")

document_store.write_documents(dataset)

Initializing the Retriever

Let’s initialize a BM25Retriever and make it use the InMemoryDocumentStore we initialized earlier in this tutorial:

from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store=document_store, top_k=2)

Initializing the PromptNode

Let’s define a custom prompt to use with our PromptNode. This prompt will accept $texts and $query as parameters. $text will match the output of the Shaper and $query will match the query we pass at runtime.

We’ll initialize PromptNode with the new PromptTemplate and google/flan-t5-large model.

from haystack.nodes import PromptNode, PromptTemplate

lfqa_prompt = PromptTemplate(
    name="lfqa",
    prompt_text="""Synthesize a comprehensive answer from the following text for the given question. 
                             Provide a clear and concise response that summarizes the key points and information presented in the text. 
                             Your answer should be in your own words and be no longer than 50 words. 
                             \n\n Related text: $documents \n\n Question: $query \n\n Answer:""",
)

prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=lfqa_prompt)

To learn about how to use custom templates with PromptNode, check out Customizing PromptNode for NLP Tasks tutorial.

Initializing the Shaper

Shaper is necessary when the output of one node does not match what the next node expects as input. In our pipeline, we need to join retrieved Documents so that we can inject these Documents into the prompt. We can solve this problem by defining a Shaper that uses join_documents as its function (func). Retriever refers Documents as documents and join_documents expects documents parameter, so, we can pass {"documents": "documents"} to the Shaper as inputs. To output joined Documents as documents, we need to define outputs=["documents"].

Let’s initialize the Shaper:

from haystack.nodes import Shaper

shaper = Shaper(func="join_documents", inputs={"documents": "documents"}, outputs=["documents"])

Defining the Pipeline

We’ll use a custom pipeline with the Retriever, Shaper, and PromptNode.

from haystack.pipelines import Pipeline

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=shaper, name="shaper", inputs=["retriever"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["shaper"])

That’s it! The pipeline’s ready to generate answers to questions!

Asking a Question

We use the pipeline run() method to ask a question.

output = pipe.run(query="How does Rhodes Statue look like?")

print(output["results"])

Here are some other example queries to test:

examples = [
    "Where is Gardens of Babylon?",
    "Why did people build Great Pyramid of Giza?",
    "How does Rhodes Statue look like?",
    "Why did people visit the Temple of Artemis?",
    "What is the importance of Colossus of Rhodes?",
    "What happened to the Tomb of Mausolus?",
    "How did Colossus of Rhodes collapse?",
]

🎉 Congratulations! You’ve learned how to create a generative QA system for your documents with PromptNode.