Using Mem0 Memory Store with Haystack Agents

🧑‍🔬 Discuss Experimental Feature Open in Colab Download

_{Last Updated:
February 12, 2026}

Mem0 is a managed memory layer for AI agents. Instead of passing entire conversation histories to an LLM on every turn, Mem0 intelligently extracts and compresses key facts from conversations into optimized memory representations.

At a high level, Mem0 manages a cycle of extraction, consolidation, and retrieval. When new messages arrive, relevant facts are identified and stored. Over time, memories are merged, updated, or allowed to fade if they lose relevance. When the agent later needs context, Mem0 surfaces only the memories most relevant to the current query, helping to keep token usage and latency low.

In this notebook, we will:

Set up a Mem0MemoryStore and add memories about a user
Inspect what Mem0 actually stored
Create a Haystack Agent that uses the memory store
Ask the Agent personalized questions and see how it leverages stored memories

Note: The Mem0 integration with Haystack lives in the haystack-experimental package and is currently experimental. The API may change in future releases. For the latest status, check the haystack-experimental repository.

Install the required dependencies

We need haystack-ai for the core framework, haystack-experimental for the Mem0 memory store integration and the experimental Agent, and mem0ai as the underlying Mem0 client library.

!pip install haystack-ai haystack-experimental mem0ai

Set up API keys

This notebook requires two API keys:

OPENAI_API_KEY: Used by the OpenAIChatGenerator to power the Agent’s LLM.
MEM0_API_KEY: Used by Mem0MemoryStore to connect to the Mem0 Platform. You can get a free API key by signing up.

import os
from getpass import getpass

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

if not os.environ.get("MEM0_API_KEY"):
    os.environ["MEM0_API_KEY"] = getpass("Enter your Mem0 API key: ")

Create the Mem0 Memory Store

The Mem0MemoryStore connects to the Mem0 Platform using the MEM0_API_KEY environment variable. Each memory is associated with a user_id, which allows Mem0 to maintain separate memory spaces for different users.

We will add several facts about a user - their preferences, background, and work context. Mem0 will extract the key facts from these messages and store them as structured memories. When the Agent later queries the memory store, only the most relevant memories will be retrieved rather than replaying the full conversation history.

from haystack.dataclasses import ChatMessage
from haystack_experimental.memory_stores.mem0 import Mem0MemoryStore

memory_store = Mem0MemoryStore()

messages = [
    ChatMessage.from_user("I like to listen to Russian pop music"),
    ChatMessage.from_user("I liked cold spanish latte with oat milk"),
    ChatMessage.from_user("I live in Florence Italy and I love mountains"),
    ChatMessage.from_user(
        "I am a software engineer and I like building application in python. "
        "Most of my projects are related to NLP and LLM agents. "
        "I find it easier to use Haystack framework to build my projects."
    ),
    ChatMessage.from_user(
        "I work in a startup and I am the CEO of the company. "
        "I have a team of 10 people and we are building a platform "
        "for small businesses to manage their customers and sales."
    ),
]

memory_store.add_memories(user_id="agent_example", messages=messages)

[{'memory_id': '31eb96d0-c679-4a2c-8b65-ec617c3d7e17',
  'memory': 'User likes listening to Russian pop music.'},
 {'memory_id': '503f44b3-45ed-4a7d-bb0b-d5cfaf536e32',
  'memory': 'User enjoyed a cold Spanish latte with oat milk.'},
 {'memory_id': '7f757c93-c00e-4972-a9b5-ce20a8f47efd',
  'memory': 'User lives in Florence, Italy, and loves mountains.'},
 {'memory_id': 'f68058f0-93ef-471d-9092-f78c155c0f07',
  'memory': 'User is a software engineer and CEO of a startup with 10 people, building a platform for small businesses to manage customers and sales, and prefers Python and the Haystack framework for NLP and LLM agent projects.'}]

Inspect Stored Memories

Before connecting the memory store to an Agent, let’s see what Mem0 actually stored. The search_memories method lets us query the memory store and see which memories are retrieved for a given query. This is useful for understanding how Mem0 extracts and condenses information from raw messages.

results = memory_store.search_memories(
    query="What programming tools does this person use?",
    user_id="agent_example",
    top_k=3,
)

for message in results:
    print(message.text)

User is a software engineer and CEO of a startup with 10 people, building a platform for small businesses to manage customers and sales, and prefers Python and the Haystack framework for NLP and LLM agent projects.
User likes listening to Russian pop music.
User enjoyed a cold Spanish latte with oat milk.

Notice how Mem0 has rephrased the original messages into facts. Instead of storing the full sentences, it extracts the key information and returns the memories most relevant to the query.

Create the Agent with Memory

Now we create a Haystack Agent that uses an OpenAIChatGenerator as its LLM and the Mem0MemoryStore as its memory.

When the Agent runs, it will:

Search the memory store using the user’s message as a query
Inject relevant memories into the conversation as context
Generate a response that takes those memories into account
Save new messages back to the memory store automatically

from haystack.components.generators.chat.openai import OpenAIChatGenerator
from haystack_experimental.components.agents.agent import Agent

chat_generator = OpenAIChatGenerator(model="gpt-5-mini")

agent = Agent(
    chat_generator=chat_generator,
    memory_store=memory_store,
)

No tools provided to the Agent. The Agent will behave like a ChatGenerator and only return text responses. To enable tool usage, pass tools directly to the Agent, not to the chat_generator.

Ask a Personalized Question

Let’s ask the Agent a question that it can only answer well if it remembers the user’s preferences. Based on the stored memories, the Agent knows the user is a Python developer who uses Haystack and works on NLP/LLM projects - so it should tailor its recommendation accordingly.

Note the memory_store_kwargs parameter: this is where we pass the user_id so the Agent knows which user’s memories to search. Behind the scenes, the Agent will query Mem0 with the user’s message, retrieve relevant memories, inject them into the prompt as additional context, and then generate a response.

result = agent.run(
    messages=[
        ChatMessage.from_user(
            "Based on what you know about me, which framework should I use to design an AI travel agent?"
        )
    ],
    memory_store_kwargs={"user_id": "agent_example"},
)

print(result["last_message"].text)

Short answer: given that you're a Python-preferring engineer who already likes Haystack, use Haystack as the primary framework — and complement it with a vector DB (managed if you want to move fast), an LLM provider (OpenAI or a local LLM), and small orchestration pieces (FastAPI, background workers, connectors to booking APIs). Haystack maps directly to the retrieval + generation + multi‑turn needs of a travel agent and lets you build production pipelines in Python.

Why Haystack fits you (based on what I know about you)
- You prefer Python and already like Haystack for NLP/LLM projects — so lower ramp-up time and predictable productivity.
- Haystack is built around retriever/reader/generation pipelines and conversational state, which matches travel agent needs (knowledge retrieval, multi‑turn booking dialogs, guided form filling).
- It’s modular, so you can replace components (vector DB, LLM) as you scale.

Recommended architecture and components
- Frontend: React / mobile UI talking to a FastAPI backend.
- API & orchestration (Python): FastAPI + Celery/RQ for long-running tasks (ticket search, booking).
- Retrieval layer: Haystack DocumentStore + EmbeddingRetriever. DocumentStore options:
  - Elasticsearch / OpenSearch (good for hybrid search + scale)
  - Milvus / Pinecone / Weaviate (managed vector DB for fast iteration)
  - FAISS (local, good for prototyping)
- Embeddings/LLM:
  - Managed (OpenAI, Anthropic) for fastest reliable quality.
  - Or self‑hosted LLMs (Llama 2 / Mistral via Hugging Face or TGI) if you want cost/control.
- Conversational pipeline: Haystack ConversationalPipeline / GenerativeQAPipeline for multi‑turn QA and context-aware responses.
- Tooling & action execution:
  - Implement booking/payment connectors as custom Haystack nodes or as separate microservices called by the orchestration layer.
  - For tool-like behavior (calling flight/hotel APIs, calendar, payments), either use LLM function-calling (OpenAI) or orchestrate actions server-side based on parsed intents/entities.
- NLU extras: use NER and slot-filling to extract traveler info (dates, origins, passport info), or integrate a light Rasa-style dialogue policy if you want explicit state machines.
- Monitoring & safety: logging, prompt/response auditing, input validation, rate limiting, and consent/privacy handling for PII.

How to prototype quickly (MVP path)
1. Ingest relevant documents: FAQs, travel policies, vendor docs, sample itineraries into Haystack DocumentStore.
2. Plug an EmbeddingRetriever and a generative LLM (OpenAI/gpt-4o or local model) via Haystack.
3. Build a simple ConversationalPipeline that keeps context and can answer queries and ask clarifying questions.
4. Add an action node or endpoint that converts validated slots into an API call to a test booking API (or a mocked service).
5. Iterate on prompts, retrieval settings (top_k), and add a vector DB if you need scale.

When to consider alternatives or complements
- If you want heavy agentic tool chaining out of the box, LangChain has a more mature “agent” ecosystem — but you can integrate LangChain agents with your Python/Harstack stack or implement minimal agent behavior via Haystack nodes.
- If you need rich dialogue policies across many flows, consider combining Haystack for knowledge/RAG with Rasa for fine-grained dialog management.
- If document ingestion/metadata indexing is your main focus, LlamaIndex can simplify some ingestion tasks; you can still use Haystack for runtime pipelines.

Scaling and production tips
- Start with managed vector DB (Pinecone, Weaviate Cloud, Milvus Cloud) to avoid ops overhead.
- Cache common queries/itineraries and batch embedding calls.
- Use a hybrid retriever (sparse + dense) for precision and recall.
- Separate “tell me” flows (informational) from “do” flows (bookings) and require stronger validation/consent for the latter.
- Instrument for user feedback and fine-tune prompts/response reranking based on logs.

If you want, I can:
- Sketch a minimal Haystack-based project structure and sample pipeline code.
- Recommend specific vector DB + LLM combos based on budget and latency requirements.
- Provide a checklist for PII/booking compliance and testing.

Which of those follow-ups would you like next?

Ask a Follow-Up Question

Let’s try a different kind of question. This time about personal preferences rather than technical skills.

result = agent.run(
    messages=[
        ChatMessage.from_user(
            "Can you suggest a vacation destination for me? "
            "I plan to have some time off in June this year."
        )
    ],
    memory_store_kwargs={"user_id": "agent_example"},
)

print(result["last_message"].text)

Sure — I can help. A few quick questions so I can make better suggestions:
- How long is your trip (long weekend, 1 week, 2+ weeks)?
- Where will you be traveling from / how far do you want to fly?
- What's your budget range (economy, mid, luxury)?
- What kind of trip do you prefer: beach, city & culture, nature/hiking, food/wine, relaxation, active/adventure, family-friendly?
- Who are you traveling with (solo, couple, friends, kids)?
- Any health/visa/other constraints I should know about?

If you want ideas now, here are several good June options across tastes and budgets, with why June is a good time to go:

- Santorini or Crete, Greece — warm weather, sunny beaches, great food and island vibes. Early–mid June is before peak late‑July crowds and still reliably sunny.
- Dubrovnik & Hvar, Croatia — beautiful coast, historic Old Town, island hopping and clear seas. June has pleasant temperatures and long daylight.
- Iceland (Ring Road / Golden Circle) — near‑24‑hour daylight, waterfalls, glaciers and dramatic landscapes; ideal for road trips and outdoor photography.
- Norwegian fjords (Bergen, Aurland, Geiranger) — hiking and scenic drives with mild weather and the midnight sun; fewer mosquitoes than later summer.
- Canadian Rockies (Banff / Jasper) — turquoise lakes, alpine hiking, wildlife viewing with snow mostly melted at higher trails by June.
- Azores, Portugal — lush landscapes, whale/sea life watching, hiking and geothermal baths; cooler, green and less crowded than mainland Portugal.
- Hokkaido, Japan — pleasant cool weather, great seafood and outdoor activities; avoids Japan’s main rainy season which hits Honshu in June.
- Algarve (Portugal) or Amalfi Coast (Italy) — classic June sun/beach + coastal villages; book lodging early as tourism ramps up.

If you tell me your answers to the questions above, I’ll recommend 2–3 tailored destinations and can suggest a sample itinerary, best neighborhoods, and packing tips. Which direction do you want to go?

The Agent should reference the user’s love of mountains and their location in Florence, Italy, demonstrating that Mem0 retrieves different memories depending on the query context. A question about frameworks surfaces technical memories, while a question about vacations surfaces lifestyle memories.

Moreover, new messages are automatically passed to the memory store, so now it should also remember when the user plans their vacation.

results = memory_store.search_memories(
    query="What are the vacation plans?",
    user_id="agent_example",
    top_k=3,
)

for message in results:
    print(message.text)

User plans to have time off in June 2026
User is a software engineer and CEO of a startup with 10 people, building a platform for small businesses to manage customers and sales, and prefers Python and the Haystack framework for NLP and LLM agent projects.
User lives in Florence, Italy, and loves mountains.

Memory vs. Retrieval

If you have used Haystack for RAG, you might wonder how a memory store differs from a document retriever. While both supply additional context to the LLM, they serve different purposes:

What is stored - A retriever pulls chunks from a knowledge base built on top of your documents. A memory store holds distilled facts learned from conversations, like user preferences, past decisions, stated goals, but not raw documents.
Who it belongs to - Retrieved documents are typically shared across all users. Memories are scoped to a specific user, agent, or session, enabling personalization.
How it evolves - A document store changes only when you explicitly index new content. A memory store evolves automatically as the agent converses: new facts are extracted, conflicting ones are updated, and low-relevance memories decay over time.

In practice, the two are complementary. A Haystack Agent can use a retriever to answer factual questions from your knowledge base while using a memory store to remember who it is talking to and what they care about.

Clean Up

Since memories are stored in your Mem0 account, let’s clean up the memories we created in this notebook to avoid leaving orphaned data.

memory_store.delete_all_memories(user_id="agent_example")