DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

OllamaGenerator

A component that provides an interface to generate text using an LLM running on Ollama.

NameOllamaGenerator
Folder Pathhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama
Most common Position in a PipelineAfter a PromptBuilder
Mandatory Input variablesβ€œprompt”: a string containing the prompt for the LLM
Output variablesβ€œreplies”: a list of strings with all the replies generated by the LLM

”meta”: a list of dictionaries with the metadata associated with each reply, such as token count and others

Overview

OllamaGenerator provides an interface to generate text using an LLM running on Ollama.

OllamaGenerator needs a model name and a url to work. By default, it uses "orca-mini" model and "http://localhost:11434/api/generate" url.

Ollama is a project focused on running LLMs locally. Internally, it uses the quantized GGUF format by default. This means it is possible to run LLMs on standard machines (even without GPUs) without having to go through complex installation procedures.

Streaming

OllamaGenerator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback init parameter.

Usage

  1. You need a running instance of Ollama. You can find the installation instructions here.
    A fast way to run Ollama is using Docker:
docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
  1. You need to download or pull the desired LLM. The model library is available on the Ollama website.
    If you are using Docker, you can, for example, pull the Zephyr model:
docker exec ollama ollama pull zephyr

If you have already installed Ollama in your system, you can execute:

ollama pull zephyr

πŸ‘

Choose a specific version of a model

You can also specify a tag to choose a specific (quantized) version of your model. The available tags are shown in the model card of the Ollama models library. This is an example for Zephyr.
In this case, simply run

# ollama pull model:tag
ollama pull zephyr:7b-alpha-q3_K_S
  1. You also need to install the ollama-haystack package:
pip install ollama-haystack

On its own

Here's how the OllamaGenerator would work just on its own:

from haystack_integrations.components.generators.ollama import OllamaGenerator

generator = OllamaGenerator(model="zephyr",
                            url = "http://localhost:11434/api/generate",
                            generation_kwargs={
                              "num_predict": 100,
                              "temperature": 0.9,
                              })

print(generator.run("Who is the best American actor?"))

# {'replies': ['I do not have the ability to form opinions or preferences.
# However, some of the most acclaimed american actors in recent years include
# denzel washington, tom hanks, leonardo dicaprio, matthew mcconaughey...'],
#'meta': [{'model': 'zephyr', ...}]}

In a Pipeline

from haystack_integrations.components.generators.ollama import OllamaGenerator

from haystack import Pipeline, Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="I really like summer"),
                          Document(content="My favorite sport is soccer"),
                          Document(content="I don't like reading sci-fi books"),
                          Document(content="I don't like crowded places"),])

generator = OllamaGenerator(model="zephyr",
                            url = "http://localhost:11434/api/generate",
                            generation_kwargs={
                              "num_predict": 100,
                              "temperature": 0.9,
                              })

pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

result = pipe.run({"prompt_builder": {"query": query},
									"retriever": {"query": query}})

print(result)

# {'llm': {'replies': ['Based on the provided context, it seems that you enjoy
# soccer and summer. Unfortunately, there is no direct information given about 
# what else you enjoy...'],
# 'meta': [{'model': 'zephyr', ...]}}

Related Links

Check out the API reference in the GitHub repo or in our docs: