Function Calling and Multimodal QA with Gemini
Last Updated: December 10, 2024
by Tuana Celik: Twitter, LinkedIn, Tilde Thurium: Twitter, LinkedIn and Silvano Cerza: LinkedIn
π Check out the Gemini Models with Google Vertex AI Integration for Haystack article for a detailed run through of this example.
This is a notebook showing how you can use Gemini with Haystack 2.0.
Gemini is Google’s newest model. You can read more about its capabilities here.
Install dependencies
As a prerequisite, you need to have a Google Cloud Project set up that has access to Gemini. Following that, you’ll only need to authenticate yourself in this Colab.
First thing first we need to install our dependencies.
(You can ignore the pip dependency error for cohere
and tiktoken
, that’s irrelevant for our purposes.)
!pip install --upgrade haystack-ai google-vertex-haystack trafilatura
To use Gemini you need to have a Google Cloud Platform account and be logged in using Application Default Credentials (ADCs). For more info see the official documentation.
Time to login!
from google.colab import auth
auth.authenticate_user()
Remember to set the project_id
variable to a valid project ID that you have enough authorization to use for Gemini.
We’re going to use this one throughout the example!
To find your project ID you can find it in the
GCP resource manager or locally by running gcloud projects list
in your terminal. For more info on the gcloud CLI see the
official documentation.
project_id = input("Enter your project ID:")
Use gemini-1.5-flash
Answer Questions
Now that we setup everything we can create an instance of our Gemini component.
from haystack_integrations.components.generators.google_vertex import VertexAIGeminiGenerator
gemini = VertexAIGeminiGenerator(model="gemini-1.5-flash", project_id=project_id)
Let’s start by asking something simple.
This component expects a list of Parts
as input to the run()
method. Parts can be anything from a message, to images, or even function calls. Here are the docstrings from the source code for the most up-to-date reference we could find
here.
result = gemini.run(parts = ["What is the most interesting thing you know?"])
for answer in result["replies"]:
print(answer)
Answer Questions about Images
Let’s try something a bit different! gemini-1.5-flash
can also work with images, let’s see if we can have it answer questions about some robots π
We’re going to download some images for this example. π€
import requests
from haystack.dataclasses.byte_stream import ByteStream
URLS = [
"https://raw.githubusercontent.com/silvanocerza/robots/main/robot1.jpg",
"https://raw.githubusercontent.com/silvanocerza/robots/main/robot2.jpg",
"https://raw.githubusercontent.com/silvanocerza/robots/main/robot3.jpg",
"https://raw.githubusercontent.com/silvanocerza/robots/main/robot4.jpg"
]
images = [
ByteStream(data=requests.get(url).content, mime_type="image/jpeg")
for url in URLS
]
Next, let’s run the VertexAIGeminiGenerator
component on it’s own.
result = gemini.run(parts = ["What can you tell me about this robots?", *images])
for answer in result["replies"]:
print(answer)
Did Gemini recognize all its friends? π
Function Calling with gemini-pro
With gemini-pro
, we can also start introducing function calling!
So let’s see how we can do that π
Let’s see if we can build a system that can run a get_current_weather
function, based on a question asked in natural language.
First we create our function definition and tool.
For demonstration purposes, we’re simply creating a get_current_weather
function that returns an object which will always tell us it’s ‘Sunny, and 21.8 degrees’.. If it’s Celsius, that’s a good day! βοΈ
def get_current_weather(location: str, unit: str = "celsius"):
return {"weather": "sunny", "temperature": 21.8, "unit": unit}
Now we have to provide this function as a Tool
to Gemini. So, first we need to create a FunctionDeclaration
that explains this function to Gemini π
from vertexai.generative_models import Tool, FunctionDeclaration
get_current_weather_func = FunctionDeclaration(
name="get_current_weather",
description="Get the current weather in a given location",
parameters={
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit",
],
},
},
"required": ["location"],
},
)
tool = Tool([get_current_weather_func])
We’re also going to chat with Gemini this time, we’re going to use another class for this.
We also need the Gemini Pro model to use functions, Gemini Pro Vision doesn’t support functions.
Let’s create a VertexAIGeminiChatGenerator
from haystack_integrations.components.generators.google_vertex import VertexAIGeminiChatGenerator
gemini_chat = VertexAIGeminiChatGenerator(model="gemini-pro", project_id=project_id, tools=[tool])
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_user(content = "What is the temperature in celsius in Berlin?")]
res = gemini_chat.run(messages=messages)
res["replies"]
Look at that! We go a message with some interesting information now. We can use that information to call a real function locally.
Let’s do exactly that and pass the result back to Gemini.
weather = get_current_weather(**res["replies"][0].text)
messages += res["replies"] + [ChatMessage.from_function(content=weather, name="get_current_weather")]
res = gemini_chat.run(messages = messages)
res["replies"][0].text
Seems like the weather is nice and sunny, remember to put on your sunglasses. π
Build a full Retrieval-Augmented Generation Pipeline with gemini-1.5-flash
As a final exercise, let’s add the VertexAIGeminiGenerator
to a full RAG pipeline. In the example below, we are building a RAG pipeline that does question answering on the web, using gemini-1.5-flash
from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack import Pipeline
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
document_splitter = DocumentSplitter(split_by="word", split_length=50)
similarity_ranker = TransformersSimilarityRanker(top_k=3)
gemini = VertexAIGeminiGenerator(model="gemini-1.5-flash", project_id=project_id)
prompt_template = """
According to these documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Answer the given question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)
pipeline = Pipeline()
pipeline.add_component("fetcher", fetcher)
pipeline.add_component("converter", converter)
pipeline.add_component("splitter", document_splitter)
pipeline.add_component("ranker", similarity_ranker)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("gemini", gemini)
pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "gemini")
Let’s try asking Gemini to tell us about Haystack 2.0 and how to use it.
question = "What do graphs have to do with Haystack?"
result = pipeline.run({"prompt_builder": {"question": question},
"ranker": {"query": question},
"fetcher": {"urls": ["https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent"]}})
for answer in result["gemini"]["replies"]:
print(answer)
Now you’ve seen some of what Gemini can do, as well as how to integrate it with Haystack 2.0. If you want to learn more:
- check out the Haystack docs or tutorials
- Try out the Gemini quickstart colab from Google
- Participate in the Advent of Haystack