Maintained by deepset

Integration: Ollama

Use Ollama models with Haystack. Ollama allows you to get up and running with large language models, locally.

Authors

Alistair Rogers

Sachin Sachdeva

deepset

GitHub Repo PyPI Package

Introduction
Installation
Usage
- Examples

Introduction

You can use Ollama Models in your Haystack pipelines with the OllamaGenerator.

Ollama is a project focused on running Large Language Models locally. Internally it uses the quantized GGUF format by default. This means it is possible to run LLMs on standard machines (even without GPUs), without having to handle complex installation procedures.

Installation

pip install ollama-haystack

Usage

This integration provides 4 components that allow you to leverage Ollama models:

To use an Ollama model:

Follow instructions on the Ollama Github Page to pull and serve your model of choice
Initialize one of the Ollama generators with the name of the model served in your Ollama instance.

Examples

To run the example, you may choose to run a docker container serving an Ollama model of your choice. Here are some commands that work with this example:

docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
docker exec ollama ollama pull orca-mini

Text Generation

Below is the example of generative questions-answering pipeline using RAG with PromptBuilder and OllamaGenerator:

from haystack import Document, Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

from haystack_integrations.components.generators.ollama import OllamaGenerator

document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="Super Mario was an important politician"),
        Document(content="Mario owns several castles and uses them to conduct important political business"),
        Document(
            content="Super Mario was a successful military leader who fought off several invasion attempts by "
            "his arch rival - Bowser"
        ),
    ]
)

template = """
Given only the following information, answer the question.
Ignore your own knowledge.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OllamaGenerator(model="orca-mini", url="http://localhost:11434"))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

query = "Who is Super Mario?"

response = pipe.run({"prompt_builder": {"query": query}, "retriever": {"query": query}})

print(response["llm"]["replies"])

You should receive an output like (output is not deterministic):

['Based on the information provided, Super Mario is a successful military leader who fought
off several invasion attempts by his arch rival - Bowser. He is also an important politician and owns several
castles where he conducts political business. ' 'Therefore, it can be inferred that Super Mario is a combination of
both a military leader and an important politician.']

Chat Generation

from haystack.dataclasses import ChatMessage

from haystack_integrations.components.generators.ollama import OllamaChatGenerator

messages = [
    ChatMessage.from_user("What's Natural Language Processing?"),
    ChatMessage.from_system(
        "Natural Language Processing (NLP) is a field of computer science and artificial "
        "intelligence concerned with the interaction between computers and human language"
    ),
    ChatMessage.from_user("How do I get started?"),
]
client = OllamaChatGenerator(model="orca-mini", timeout=45, url="http://localhost:11434")

response = client.run(messages, generation_kwargs={"temperature": 0.2})

print(response["replies"][0].text)

You should receive an output like (output is not deterministic):

Natural Language Processing (NLP) is a complex field with many different tools and techniques to learn. Here are some steps you can take to get started:

1. Understand the basics of natural language processing: Before diving into the specifics of NLP, it's important to have a basic understanding of what natural language is and how it works. You can start by reading up on linguistics and semantics.

2. Learn about the different components of NLP: There are several components of NLP that you need to understand, including syntax, semantics, morphology, and pragmatics. You can start by learning about these components individually.

3. Choose a tool or library to use: There are many different tools and libraries available for NLP, such as NLTK, spaCy, and Stanford CoreNLP. Choose one that you feel comfortable working with and that fits your needs.

4. Practice: The best way to learn NLP is by practicing. Start with simple tasks like sentiment analysis or tokenization and work your way up to more complex ones like machine translation

Tool Calling

OllamaChatGenerator supports tool calling natively. Pass Tool instances via the tools parameter; the generator returns ToolCall entries on replies[0].tool_calls when the model decides to invoke a tool.

For reliable tool-call emission with Llama 3.1 8B, set temperature=0.0 and use a directive prompt that names the tool.

from haystack.dataclasses import ChatMessage
from haystack.tools import tool
from haystack_integrations.components.generators.ollama import OllamaChatGenerator


@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Sunny, 22°C in {city}"


generator = OllamaChatGenerator(
    model="llama3.1:8b",
    generation_kwargs={"temperature": 0.0},
    tools=[get_weather],
)

response = generator.run(
    messages=[ChatMessage.from_user(
        "What's the weather in Berlin? Use the get_weather tool."
    )]
)
print(response["replies"][0].tool_calls)
# -> [ToolCall(tool_name='get_weather', arguments={'city': 'Berlin'}, ...)]

Tool execution and multi-turn tool-result handling are covered in the OllamaChatGenerator component reference.

Embedders

OllamaDocumentEmbedder helps compute embeddings for a list of Documents and updates each Document’s embedding field with its embedding vector.
OllamaTextEmbedder computes the embeddings of a particular string.

Both OllamaTextEmbedder and OllamaDocumentEmbedder use embedding models compatible with the Ollama Library.

To run the below example, use the below command to serve a nomic-embed-text model from Ollama:

docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
docker exec ollama ollama pull nomic-embed-text

Below is an example that uses both OllamaDocumentEmbedder and OllamaTextEmbedder.

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.ollama.document_embedder import OllamaDocumentEmbedder
from haystack_integrations.components.embedders.ollama.text_embedder import OllamaTextEmbedder

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="My name is Wolfgang and I live in Berlin"),
]

document_embedder = OllamaDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OllamaTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])