Integration: Marqo

A Document Store for storing and retrieval from Marqo - built for Haystack 2.0

Authors
marqo-ai

Table of Contents

Overview

This integration allows you to use Marqo DB as the document store for your Haystack pipelines. This page provides simple instructions on how to start it up and how to initialize a MarqoDocumentStore that can be used in any Haystack 2.0 pipeline.

Installation

pip install marqo-haystack

Usage

Once installed, you can start using your Marqo database with Haystack 2.0. The MarqoDocumentStore is compatible with the open-source Marqo Docker container and with the Marqo managed cloud offering.

Getting Started Locally with the Marqo Docker Container

For x86 machines

docker pull marqoai/marqo:latest
docker rm -f marqo
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest

For M1/M2 ARM machines

docker rm -f marqo-os; docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" marqoai/marqo-os:0.0.3-arm

Next, in a new terminal:

docker rm -f marqo; docker run --name marqo --privileged \
    -p 8882:8882 --add-host host.docker.internal:host-gateway \
    -e "OPENSEARCH_URL=https://localhost:9200" \
    marqoai/marqo:latest

Getting started with Marqo Cloud

Log in or create an account at https://cloud.marqo.ai. Create a new index with the indexing mode set as “Text-optimised”.

Initializing a MarqoDocumetStore in Haystack

from marqo_haystack import MarqoDocumentStore
 
document_store = MarqoDocumentStore()

If you are using the Docker container then this will use an index called documents, if it doesn’t exist then it will be created.

If you are using Marqo cloud then you can connect to an existing index like so:

from marqo_haystack import MarqoDocumentStore
 
document_store = MarqoDocumentStore(
    url="https://api.marqo.ai",
    api_key="XXXXXXXXXXXXX",
    collection_name="my-cloud-index"
)

Writing Documents to MarqoDocumentStore

To write documents to MarqoDocumentStore, create an indexing pipeline.

from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})

Using the MarqoRetriever

To retrieve documents from your Marqo document store, create a querying pipeline.

To send a single query use the MarqoSingleRetriever:

from marqo_haystack.retriever import MarqoSingleRetriever

querying = Pipeline()
querying.add_component("retriever", MarqoSingleRetriever(document_store))
results = querying.run({"retriever": {"query": "Who is Marco Polo?", "top_k": 3}})

To send a list of queries use the MarqoRetriever:

from marqo_haystack.retriever import MarqoRetriever

querying = Pipeline()
querying.add_component("retriever", MarqoRetriever(document_store))
results = querying.run({"retriever": {"queries": ["Who is Marco Polo?", "Can Hippos swim?"], "top_k": 3}})