Build with Gemma 4 and Haystack

_{Last Updated:
April 20, 2026}

Gemma 4 is a family of great open models by Google.

Some interesting facts:

4 sizes (E2B, E4B, 26B4A, 31B)
Reasoning models
Support for tool calling and structured outputs
Multimodal
Apache 2.0 license
Up to 256K context window
Trained on 140+ languages

We will see some examples of what we can build with Gemma 4 and the Haystack LLM framework: from RAG to multimodal agents.

Installation

First of all, to have the model running fast enough, you need to Enable GPU Runtime in Colab.

Let’s install the Python packages we need.

! uv pip install haystack-ai mcp-haystack ollama-haystack nest_asyncio pillow pypdfium2 python-weather wikipedia

In this notebook we’ll use Ollama to serve Gemma 4 E4B. So we install it and download the model.

! apt-get install zstd pciutils && curl -fsSL https://ollama.com/install.sh | sh
! setsid nohup ollama serve > ollama.log 2>&1 < /dev/null &
! until curl -s http://127.0.0.1:11434 > /dev/null; do sleep 0.5; done; echo "ready"
! ollama pull gemma4:e4b

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pciutils is already the newest version (1:3.7.0-6).
zstd is already the newest version (1.4.8+dfsg-3build1).
0 upgraded, 0 newly installed, 0 to remove and 42 not upgraded.
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
[1m[31mWARNING:[m systemd is not running
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
ready
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l

RAG with Gemma (about Rock music) 🎸

Why not start with some classic RAG? Let’s see how this model performs.

In this notebook we’ll use Ollama to serve a quantized version of Gemma 4 E4B. This model has 4.5B effective parameters (8B with embeddings) and is designed for on-device use.

There are also other options available to serve this model, like Transformers or llama.cpp. For production workloads, vLLM is recommended.

All of these options are supported in Haystack.

Load data from Wikipedia

favourite_bands="""Audioslave
Blink-182
Dire Straits
Evanescence
Green Day
Muse (band)
Nirvana (band)
Sum 41
The Cure
The Smiths""".split("\n")

import random
import wikipedia
from haystack.dataclasses import Document

raw_docs=[]

for title in favourite_bands:
    page = wikipedia.page(title=title, auto_suggest=False)
    doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
    raw_docs.append(doc)

Indexing Pipeline

from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

document_store = InMemoryDocumentStore()

indexing = Pipeline()
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=2))
indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))

indexing.connect("cleaner", "splitter")
indexing.connect("splitter", "writer")

indexing.run({"cleaner":{"documents":raw_docs}})

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.





{'writer': {'documents_written': 662}}

document_store.filter_documents()[0].meta

{'title': 'Audioslave',
 'url': 'https://en.wikipedia.org/wiki/Audioslave',
 'source_id': 'ab025d6385ab46b73dcf198e52d084f6e62f0c210ea3ecca1ce317d4ac21858c',
 'page_number': 1,
 'split_id': 0,
 'split_idx_start': 0}

RAG Pipeline

We set up the OllamaChatGenerator with think=False: RAG doesn’t require extended reasoning here, so we disable thinking mode to keep answers fast. We also pass print_streaming_chunk as the streaming callback, a helper that prints tokens to stdout as they arrive.

from haystack.components.builders import ChatPromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.generators.utils import print_streaming_chunk
from haystack.document_stores.in_memory import InMemoryDocumentStore

from haystack_integrations.components.generators.ollama import OllamaChatGenerator

prompt_template = """
{% message role="user"%}
Using the information contained in the context, give a comprehensive answer to the question.
If the answer is contained in the context, also report the source URL.
If the answer cannot be deduced from the context, do not give an answer.

Context:
{% for doc in documents %}
{{ doc.content }} URL:{{ doc.meta['url'] }}
{% endfor %};

Question: {{query}}
{% endmessage %}
"""
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*")

llm = OllamaChatGenerator(
    model="gemma4:e4b",
    think=False,
    streaming_callback=print_streaming_chunk,
    timeout=180)


rag = Pipeline()
rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", llm)

rag.connect("retriever.documents", "prompt_builder.documents")
rag.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7afae4edcd10>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: ChatPromptBuilder
  - llm: OllamaChatGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

def get_generative_answer(query):
  rag.run({
      "retriever": {"query": query},
      "prompt_builder": {"query": query}
    }
  )

get_generative_answer("Dire Straits' hit song 'Money for Nothing' features a guest vocal by a famous artist. Who is this artist?")

[ASSISTANT]
The guest vocalist on Dire Straits' hit song "Money for Nothing" was Sting.

Source: https://en.wikipedia.org/wiki/Dire_Straits

nice_questions_to_try="""What was the original name of Sum 41?
What was the title of Nirvana's breakthrough album released in 1991?
Green Day's "American Idiot" is a rock opera. What's the story it tells?
Audioslave was formed by members of two iconic bands. Can you name the bands and discuss the sound of Audioslave in comparison?
Evanescence's "Bring Me to Life" features a male vocalist. Who is he, and how does his voice complement Amy Lee's in the song?
What is Sum 41's debut studio album called?
Who was the lead singer of Audioslave?
When was Nirvana's first studio album, "Bleach," released?
Were the Smiths an influential band?
What is the name of Evanescence's debut album?
Which band was Morrissey the lead singer of before he formed The Smiths?
Dire Straits' hit song "Money for Nothing" features a guest vocal by a famous artist. Who is this artist?
Who played the song "Like a stone"?""".split('\n')

q=random.choice(nice_questions_to_try)
print(q)
get_generative_answer(q)

Who was the lead singer of Audioslave?
[ASSISTANT]
The lead singer of Audioslave was Chris Cornell. Source: https://en.wikipedia.org/wiki/Audioslave

📷 Image Question Answering

Time to test the multimodal capabilities of this model.

Let’s download the “Scaling Instruction-Finetuned Language Models” paper by Google and ask a question about a plot.

This time we’ll use the model in thinking mode.

! wget "https://arxiv.org/pdf/2210.11416.pdf" -O flan_paper.pdf

from haystack.components.converters.image import PDFToImageContent
from haystack.components.generators.utils import print_streaming_chunk

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.ollama import OllamaChatGenerator

pdf_converter = PDFToImageContent()
paper_page_image = pdf_converter.run(sources=["flan_paper.pdf"], page_range="9")["image_contents"][0]
paper_page_image.show()

user_message = ChatMessage.from_user(content_parts=["What is the main takeaway of Figure 6? Be brief and accurate.", paper_page_image])

llm = OllamaChatGenerator(
    model="gemma4:e4b",
    think=True,
    streaming_callback=print_streaming_chunk)

_ = llm.run([user_message])

[REASONING]
Thinking Process:

1.  **Analyze the Request:** The user wants to know the main takeaway of Figure 6, and the answer must be brief and accurate.
2.  **Analyze Figure 6 (The Image):**
    *   **Title/Caption:** "Figure 6: Zero-shot performance of PaLM and Flan-PaLM on a set of 23 challenging BIG-Bench tasks (BBH). Flan-PaLM benefits from chain-of-thought (CoT) generation activated via 'let's think step-by-step.'"
    *   **Axes:** Y-axis is BBH accuracy (%). X-axis shows different model sizes/conditions (8B, 62B, 540B).
    *   **Legend/Data Series:**
        *   Blue (PaLM: Zero-shot)
        *   Orange (PaLM: Zero-shot + CoT)
        *   Light Blue (Flan-PaLM: Zero-shot)
        *   Dark Blue (Flan-PaLM: Zero-shot + CoT)
    *   **Observe Trends:**
        *   *PaLM vs Flan-PaLM:* Flan-PaLM generally performs better than PaLM.
        *   *Zero-shot vs + CoT:* Adding CoT significantly boosts performance for *both* models (compare Blue to Orange, and Light Blue to Dark Blue).
        *   *Magnitude of Improvement:* The most noticeable gains are seen when Flan-PaLM is combined with CoT (Dark Blue bars).
3.  **Analyze the Surrounding Text (Contextualizing the Takeaway):**
    *   The text discusses CoT helping models perform better, especially for complex tasks.
    *   It specifically mentions that "Flan-PaLM benefits from CoT generation activated via 'let's think step-by-step.'"
4.  **Synthesize the Main Takeaway:**
    *   The primary finding is the effectiveness of Chain-of-Thought (CoT) prompting.
    *   *Which model benefits the most?* Flan-PaLM.
    *   *What is the core mechanism?* CoT prompting significantly improves zero-shot performance on challenging tasks (BBH).
5.  **Draft the Answer (Focusing on brevity and accuracy):** Flan-PaLM substantially improves zero-shot performance on challenging tasks (BBH) when Chain-of-Thought (CoT) prompting is used, outperforming both PaLM and the non-CoT versions of Flan-PaLM. (This is accurate but slightly long.)
6.  **Refine the Answer (Making it concise):** The main takeaway is that Chain-of-Thought (CoT) prompting significantly boosts the zero-shot performance of Flan-PaLM on complex tasks (BBH), making it superior to both base PaLM and Flan-PaLM models lacking CoT. (This is precise and brief.)

[ASSISTANT]
The main takeaway is that **Chain-of-Thought (CoT) prompting significantly boosts the zero-shot performance of Flan-PaLM** on complex tasks (BBH), demonstrating that providing the model with step-by-step reasoning ability is crucial for superior performance compared to base PaLM or Flan-PaLM models without CoT.

Nice! To better understand Haystack’s multimodal capabilities, these tutorials are recommended:

⛅ Multimodal Weather Agent

We give Gemma a get_weather tool and a map image: the agent should identify the city from the image, then call the tool to retrieve current conditions.

A Haystack Agent is a LLM/ChatGenerator wrapped in an automatic tool-use loop:it executes the tools the model requests and feeds results back until the model produces a final answer.

! wget https://cdn.generazioneviaggio.it/2020/12/florence-travel-map-e1549036645956.jpg -O city_map.jpg

import asyncio
from typing import Annotated

from haystack.tools import tool

from haystack.dataclasses import ChatMessage, ImageContent
import python_weather

# only needed in Jupyter notebooks where there is an event loop running
import nest_asyncio
nest_asyncio.apply()


@tool
def get_weather(location: Annotated[str, "The location to get the weather for"]) -> dict:
    """A function to get the weather for a given location"""
    async def _fetch_weather():
        async with python_weather.Client(unit=python_weather.METRIC) as client:
            weather = await client.get(location)
            return {
                "description": weather.description,
                "temperature": weather.temperature,
                "humidity": weather.humidity,
                "precipitation": weather.precipitation,
                "wind_speed": weather.wind_speed,
                "wind_direction": weather.wind_direction
            }

    return asyncio.run(_fetch_weather())

from haystack.components.agents import Agent
from haystack.components.generators.utils import print_streaming_chunk
from haystack_integrations.components.generators.ollama import OllamaChatGenerator



llm = OllamaChatGenerator(
    model="gemma4:e4b",
    think=True)

agent = Agent(chat_generator=llm, tools=[get_weather], streaming_callback=print_streaming_chunk)

map_image = ImageContent.from_file_path("city_map.jpg")
map_image.show()

content_parts = ["""Detect the city represented in the map and find the weather.
Your final response should include city name, top 3 places to visit, and brief description of the weather.""",
                 map_image]
messages = agent.run([ChatMessage.from_user(content_parts=content_parts)])["messages"]

[REASONING]
The user wants me to perform three tasks based on the provided map:
1.  Detect the city represented in the map.
2.  Find the top 3 places to visit in that city.
3.  Find the current weather for that city and include a brief description of it.

I have access to a tool: `get_weather(location: str)`.

**Step 1: Detect the city.**
I need to analyze the map image. The map shows a river (labeled "ARNO") and several prominent historical structures, including what looks like the Duomo (Florence Cathedral) and other major landmarks typical of Italy. A quick visual identification points strongly to Florence, Italy.

**Step 2: Find top 3 places to visit.**
Since the city is Florence, I can list 3 well-known attractions:
*   The Duomo (Florence Cathedral)
*   Ponte Vecchio
*   Uffizi Gallery / Accademia Gallery (or Pitti Palace)

**Step 3: Get the weather.**
I need to use the `get_weather` tool with the location "Florence".

**Plan:**
1. Call `get_weather` for "Florence".
2. Wait for the tool output.
3. Compile the final response incorporating the city name, top 3 places, and the weather description.

[TOOL CALL]
Tool: get_weather 
Arguments: {"location": "Florence"}

[TOOL RESULT]
{'description': 'Patchy rain nearby', 'temperature': 24, 'humidity': 47, 'precipitation': 0.0, 'wind_speed': 13, 'wind_direction': WindDirection.NORTH_NORTHEAST}

[ASSISTANT]
The city represented in the map is **Florence**, Italy.

**Top 3 Places to Visit:**
1.  **Florence Cathedral (Duomo):** A masterpiece of Renaissance architecture, famous for Brunelleschi's dome.
2.  **Ponte Vecchio:** A medieval bridge spanning the Arno River, historically known for its shops built along it.
3.  **Uffizi Gallery:** One of the most important art museums in the world, housing incredible Renaissance paintings.

**Current Weather:**
The weather in Florence is currently **Patchy rain nearby**. The temperature is **24°C**, with a humidity of 47%. The wind speed is 13 km/h coming from the North-Northeast.

🔎💻 GitHub Search Agent

Now we’ll build a GitHub search agent by connecting Gemma to the official GitHub MCP server. MCP (Model Context Protocol) is a standard way for LLMs to discover and call external tools.

To run this experiment, you’ll need to pass a GitHub Personal Access Token with read permissions.

import os
from getpass import getpass

os.environ["GITHUB_PAT"] = getpass("GitHub Personal Access Token")

from haystack.components.agents import Agent
from haystack.components.generators.utils import print_streaming_chunk
from haystack.dataclasses import ChatMessage
from haystack.tools import SearchableToolset
from haystack.utils import Secret
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack_integrations.tools.mcp import MCPToolset, StreamableHttpServerInfo

mcp_server_info = StreamableHttpServerInfo(
    url="https://api.githubcopilot.com/mcp/",
    token=Secret.from_env_var("GITHUB_PAT"))

toolset = MCPToolset(
    server_info=mcp_server_info,
)

The GitHub MCP server exposes many tools, and their schemas alone can easily exceed Ollama’s default 4k context on Colab. As a first attempt, we bump num_ctx to 8192.

Another approach would be to hand-pick only the relevant tools, but we’d rather have the agent figure it out on its own.

If you’re running on a powerful laptop, you should be able to use the model without encountering these problems.

generator = OllamaChatGenerator(model="gemma4:e4b", think=True, generation_kwargs={"num_ctx": 8192}, timeout=360)
agent = Agent(
    system_prompt = """You are a highly capable GitHub agent who can infer missing details and act autonomously.
If a task fails, analyze the cause, adapt your approach, and retry using a different strategy.
Persist until the task is completed or all reasonable solutions are exhausted.""",
    chat_generator=generator,
    tools=toolset,
    streaming_callback=print_streaming_chunk,
)

question = "What are the 5 most popular projects by karpathy user and what's their focus? Very short answer."
result = agent.run([ChatMessage.from_user(question)])

[TOOL CALL]
Tool: search 
Arguments: {"query": "user:janekakle github"}



ERROR:haystack.components.tools.tool_invoker:Tool 'search' not found. Available tools: add_comment_to_pending_review, add_issue_comment, add_reply_to_pull_request_comment, assign_copilot_to_issue, create_branch, create_or_update_file, create_pull_request, create_pull_request_with_copilot, create_repository, delete_file, fork_repository, get_commit, get_copilot_job_status, get_file_contents, get_label, get_latest_release, get_me, get_release_by_tag, get_tag, get_team_members, get_teams, issue_read, issue_write, list_branches, list_commits, list_issue_types, list_issues, list_pull_requests, list_releases, list_tags, merge_pull_request, pull_request_read, pull_request_review_write, push_files, request_copilot_review, run_secret_scanning, search_code, search_issues, search_pull_requests, search_repositories, search_users, sub_issue_write, update_pull_request, update_pull_request_branch


[ASSISTANT]
What do you want to do? I can help with tasks like:

* **Reading code:** Finding information about repositories, issues, pull requests, etc.
* **Interacting with code:** Creating files, committing changes, creating issues, opening pull requests.
* **Searching/Browsing:** Searching for code or repositories.

Please specify your goal or the repository/context you are working with!

! tail ollama.log

[GIN] 2026/04/16 - 15:35:43 | 200 | 10.336574493s |       127.0.0.1 | POST     "/api/chat"
time=2026-04-16T15:35:49.667Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=8192 prompt=9233 keep=4 new=8192
[GIN] 2026/04/16 - 15:35:55 | 200 | 12.062062297s |       127.0.0.1 | POST     "/api/chat"
time=2026-04-16T15:36:33.791Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=8192 prompt=9297 keep=4 new=8192
time=2026-04-16T15:36:39.443Z level=WARN source=gemma4.go:319 msg="gemma4 tool call flush on done failed" error="expected '{' in tool call" content="call:search_code(\"user:karpathy projects\")\n"
[GIN] 2026/04/16 - 15:36:39 | 200 |  10.92845609s |       127.0.0.1 | POST     "/api/chat"
time=2026-04-16T15:36:59.274Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=8192 prompt=9297 keep=4 new=8192
[GIN] 2026/04/16 - 15:37:05 | 200 | 10.722681345s |       127.0.0.1 | POST     "/api/chat"
time=2026-04-16T15:37:13.018Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=8192 prompt=9269 keep=4 new=8192
[GIN] 2026/04/16 - 15:37:20 | 200 | 15.361395497s |       127.0.0.1 | POST     "/api/chat"

Oh no!

The prompt got truncated, so the model lost part of the tool catalog and ended up hallucinating a non-existent search tool.

Time to try Haystack’s SearchableToolset, a class that lets agents dynamically discover tools from large catalogs via keyword-based search. Tool schemas are loaded on demand, so the context stays lean.

searchable_toolset = SearchableToolset(catalog=toolset)

generator = OllamaChatGenerator(model="gemma4:e4b", think=True, generation_kwargs={"num_ctx": 8192}, timeout=360)
agent = Agent(
    system_prompt = """You are a highly capable GitHub agent who can infer missing details and act autonomously.
If a task fails, analyze the cause, adapt your approach, and retry using a different strategy.
Persist until the task is completed or all reasonable solutions are exhausted.""",
    chat_generator=generator,
    tools=searchable_toolset,
    streaming_callback=print_streaming_chunk,
)

result = agent.run([ChatMessage.from_user(question)])

[REASONING]
The user is asking for the "5 most popular projects by karpathy user" and their focus, requesting a very short answer.

I need to determine if I have a tool that can access this kind of information (e.g., GitHub profile analysis, popular projects list, specific user statistics).

Since I don't have any initial tools loaded, I must use the `search_tools` tool first to see what capabilities are available. The key entities are "karpathy", "projects", and "popular".

I will use "karpathy", "projects", and "github" as keywords.

[TOOL CALL]
Tool: search_tools 
Arguments: {"k": 3, "tool_keywords": "karpathy github projects"}

[TOOL RESULT]
Found and loaded 3 tool(s): search_repositories, list_branches, list_releases. Use them directly as tools.

[REASONING]
The search tools revealed that I have `search_repositories`, `list_branches`, and `list_releases`. These tools are related to GitHub operations.

To find the "5 most popular projects by karpathy user," the most appropriate tool is `search_repositories`, as it allows searching for repositories by user, popularity (sorting by stars), and keywords.

The user specified "by karpathy user." I will assume "karpathy" is the username.

I will use `search_repositories` with:
1.  `query`: `user:karpathy` (to filter by the user)
2.  `sort`: `stars` (to find the "most popular")
3.  `perPage`: 5 (to get the top 5)

I will execute this plan.

[TOOL CALL]
Tool: search_repositories 
Arguments: {"query": "user:karpathy", "sort": "stars", "perPage": 5}

[TOOL RESULT]
{"meta":null,"content":[{"type":"text","text":"{\"total_count\":54,\"incomplete_results\":false,\"items\":[{\"id\":1174820787,\"name\":\"autoresearch\",\"full_name\":\"karpathy/autoresearch\",\"description\":\"AI agents running research on single-GPU nanochat training automatically\",\"html_url\":\"https://github.com/karpathy/autoresearch\",\"language\":\"Python\",\"stargazers_count\":73206,\"forks_count\":10669,\"open_issues_count\":181,\"updated_at\":\"2026-04-16T15:36:48Z\",\"created_at\":\"2026-03-06T22:00:43Z\",\"private\":false,\"fork\":false,\"archived\":false,\"default_branch\":\"master\"},{\"id\":582822129,\"name\":\"nanoGPT\",\"full_name\":\"karpathy/nanoGPT\",\"description\":\"The simplest, fastest repository for training/finetuning medium-sized GPTs.\",\"html_url\":\"https://github.com/karpathy/nanoGPT\",\"language\":\"Python\",\"stargazers_count\":56738,\"forks_count\":9705,\"open_issues_count\":343,\"updated_at\":\"2026-04-16T15:21:16Z\",\"created_at\":\"2022-12-28T00:51:12Z\",\"private\":false,\"fork\":false,\"archived\":false,\"default_branch\":\"master\"},{\"id\":1075431749,\"name\":\"nanochat\",\"full_name\":\"karpathy/nanochat\",\"description\":\"The best ChatGPT that $100 can buy.\",\"html_url\":\"https://github.com/karpathy/nanochat\",\"language\":\"Python\",\"stargazers_count\":51962,\"forks_count\":6905,\"open_issues_count\":92,\"updated_at\":\"2026-04-16T15:36:58Z\",\"created_at\":\"2025-10-13T13:46:35Z\",\"private\":false,\"fork\":false,\"archived\":false,\"default_branch\":\"master\"},{\"id\":806315903,\"name\":\"LLM101n\",\"full_name\":\"karpathy/LLM101n\",\"description\":\"LLM101n: Let's build a Storyteller\",\"html_url\":\"https://github.com/karpathy/LLM101n\",\"stargazers_count\":36787,\"forks_count\":2010,\"open_issues_count\":19,\"updated_at\":\"2026-04-16T15:29:21Z\",\"created_at\":\"2024-05-27T00:23:38Z\",\"private\":false,\"fork\":false,\"archived\":true,\"default_branch\":\"master\"},{\"id\":783833344,\"name\":\"llm.c\",\"full_name\":\"karpathy/llm.c\",\"description\":\"LLM training in simple, raw C/CUDA\",\"html_url\":\"https://github.com/karpathy/llm.c\",\"language\":\"Cuda\",\"stargazers_count\":29595,\"forks_count\":3528,\"open_issues_count\":218,\"updated_at\":\"2026-04-16T15:20:01Z\",\"created_at\":\"2024-04-08T16:58:11Z\",\"private\":false,\"fork\":false,\"archived\":false,\"default_branch\":\"master\"}]}","annotations":null,"meta":null}],"structuredContent":null,"isError":false}

[REASONING]
I have successfully retrieved the 5 most starred repositories for the user `karpathy`.

The results are:
1.  **autoresearch**: AI agents running research on single-GPU nanochat training automatically (Python)
2.  **nanoGPT**: The simplest, fastest repository for training/finetuning medium-sized GPTs. (Python)
3.  **nanochat**: The best ChatGPT that $100 can buy. (Python)
4.  **LLM101n**: LLM101n: Let's build a Storyteller (N/A, but the description is provided)
5.  **llm.c**: LLM training in simple, raw C/CUDA (Cuda)

The user requested a "Very short answer" listing the 5 projects and their focus. I can compile this information directly from the `description` field provided in the API response.

Plan: Format the extracted data into a concise list.

[ASSISTANT]
The 5 most popular projects by karpathy, and their focus, are:

1.  **autoresearch**: AI agents running research on single-GPU nanochat training automatically.
2.  **nanoGPT**: The simplest, fastest repository for training/finetuning medium-sized GPTs.
3.  **nanochat**: A highly capable local ChatGPT implementation.
4.  **LLM101n**: A project focused on building a Storyteller using large language models.
5.  **llm.c**: Implementing LLM training using raw C/CUDA for efficiency.

Well done! 🎉

In this notebook we built four things with Gemma 4 and Haystack: a RAG pipeline over Wikipedia, an image QA pipeline on a PDF plot, a multimodal weather agent that reads a map, and a GitHub agent that discovers its tools on demand via MCP.

For more, check the Haystack docs, tutorials and cookbook.

(Notebook by Stefano Fiorucci)