Build with Gemma and Haystack 2.x
Last Updated: September 20, 2024
We will see what we can build with the new Google Gemma open models and the Haystack LLM framework.
Installation
! pip install haystack-ai "huggingface_hub>=0.22.0"
Authorization
- you need an Hugging Face account
- you need to accept Google conditions here: https://huggingface.co/google/gemma-7b-it and wait for the authorization
import getpass, os
os.environ["HF_API_TOKEN"] = getpass.getpass("Your Hugging Face token")
Chat with Gemma (travel assistant) 🛩
For simplicity, we call the model using the free Hugging Face Inference API with the HuggingFaceAPIChatGenerator
.
(We might also load it in Colab using the HuggingFaceLocalChatGenerator
in a quantized version).
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
generator = HuggingFaceAPIChatGenerator(
api_type="serverless_inference_api",
api_params={"model": "google/gemma-7b-it"},
generation_kwargs={"max_new_tokens": 350})
messages = []
while True:
msg = input("Enter your message or Q to exit\n🧑 ")
if msg=="Q":
break
messages.append(ChatMessage.from_user(msg))
response = generator.run(messages=messages)
assistant_resp = response['replies'][0]
print("🤖 "+assistant_resp.content)
messages.append(assistant_resp)
RAG with Gemma (about Rock music) 🎸
! pip install wikipedia
Load data from Wikipedia
favourite_bands="""Audioslave
Blink-182
Dire Straits
Evanescence
Green Day
Muse (band)
Nirvana (band)
Sum 41
The Cure
The Smiths""".split("\n")
from IPython.display import Image
from pprint import pprint
import rich
import random
import wikipedia
from haystack.dataclasses import Document
raw_docs=[]
for title in favourite_bands:
page = wikipedia.page(title=title, auto_suggest=False)
doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
raw_docs.append(doc)
Indexing Pipeline
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
document_store = InMemoryDocumentStore()
indexing = Pipeline()
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=2))
indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))
indexing.connect("cleaner", "splitter")
indexing.connect("splitter", "writer")
indexing.run({"cleaner":{"documents":raw_docs}})
document_store.filter_documents()[0].meta
RAG Pipeline
from haystack.components.builders import PromptBuilder
prompt_template = """
<start_of_turn>user
Using the information contained in the context, give a comprehensive answer to the question.
If the answer is contained in the context, also report the source URL.
If the answer cannot be deduced from the context, do not give an answer.
Context:
{% for doc in documents %}
{{ doc.content }} URL:{{ doc.meta['url'] }}
{% endfor %};
Question: {{query}}<end_of_turn>
<start_of_turn>model
"""
prompt_builder = PromptBuilder(template=prompt_template)
Here, we use the HuggingFaceAPIGenerator
since it is not a chat setting and we don’t envision multi-turn conversations but just RAG.
from haystack.components.generators import HuggingFaceAPIGenerator
generator = HuggingFaceAPIGenerator(
api_type="serverless_inference_api",
api_params={"model": "google/gemma-7b-it"},
generation_kwargs={"max_new_tokens": 500})
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
rag = Pipeline()
rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", generator)
rag.connect("retriever.documents", "prompt_builder.documents")
rag.connect("prompt_builder.prompt", "llm.prompt")
Let’s ask some questions!
def get_generative_answer(query):
results = rag.run({
"retriever": {"query": query},
"prompt_builder": {"query": query}
}
)
answer = results["llm"]["replies"][0]
rich.print(answer)
get_generative_answer("Audioslave was formed by members of two iconic bands. Can you name the bands and discuss the sound of Audioslave in comparison?")
nice_questions_to_try="""What was the original name of Sum 41?
What was the title of Nirvana's breakthrough album released in 1991?
Green Day's "American Idiot" is a rock opera. What's the story it tells?
Audioslave was formed by members of two iconic bands. Can you name the bands and discuss the sound of Audioslave in comparison?
Evanescence's "Bring Me to Life" features a male vocalist. Who is he, and how does his voice complement Amy Lee's in the song?
What is Sum 41's debut studio album called?
Who was the lead singer of Audioslave?
When was Nirvana's first studio album, "Bleach," released?
Were the Smiths an influential band?
What is the name of Evanescence's debut album?
Which band was Morrissey the lead singer of before he formed The Smiths?
Dire Straits' hit song "Money for Nothing" features a guest vocal by a famous artist. Who is this artist?
Who played the song "Like a stone"?""".split('\n')
q=random.choice(nice_questions_to_try)
print(q)
get_generative_answer(q)
This is a simple demo. We can improve the RAG Pipeline using better retrieval techniques: Embedding Retrieval, Hybrid Retrieval…
(Notebook by Stefano Fiorucci)