
Integration: OPEA
Use the OPEA framework for hardware abstraction and orchestration
Table of Contents
Overview
The haystack-opea
integration connects Haystack to
OPEAโa collection of containerized microservices for LLMs, embedding, retrieval and reranking. By delegating heavy compute to OPEA services, you can build flexible Retrieval-Augmented Generation (RAG) pipelines that scale across cloud, on-prem and edge deployments.
Key features:
- Hardware-agnostic LLM & embedding services.
- Easy orchestration of LLM, embedder, retriever, ranker, among others.
- Support for local development via Docker Compose or production clusters.
Installation
Install from source:
git clone https://github.com/opea-project/Haystack-OPEA.git
cd Haystack-OPEA
pip install poetry
poetry install --with test
Usage
Below are quickstart examples for embeddings and LLM generation. Make sure your OPEA backend is running: e.g. via the provided Docker Compose file. OPEA services can be configured to use a variety of model serving backends like TGI, vLLM, ollama, OVMS… and offer validated runtime settings for good performance on various hardware’s including Intel Gaudi, see the LLM section in the OPEA components library.
Embeddings
from haystack import Document
from haystack_opea import OPEATextEmbedder, OPEADocumentEmbedder
# Text embedding example
text_embedder = OPEATextEmbedder(api_url="http://localhost:6006")
text_embedder.warm_up()
result = text_embedder.run("I love pizza!")
print("Text embedding:", result["vectors"][0])
# Document embedding example
doc = Document(content="I love pizza!")
doc_embedder = OPEADocumentEmbedder(api_url="http://localhost:6006")
doc_embedder.warm_up()
out = doc_embedder.run([doc])
print("Document embedding:", out["documents"][0].embedding)
LLM Generation
from haystack_opea import OPEAGenerator
# Initialize the OPEA LLM service
generator = OPEAGenerator(
api_url="http://localhost:9009",
model_arguments={
"temperature": 0.2,
"top_p": 0.7,
"max_tokens": 512,
},
)
generator.warm_up()
# Run a simple prompt
response = generator.run(prompt="What is the capital of France?")
print("LLM reply:", response["replies"][0])
For more examples, see the samples/
folder and the
official OPEA documentation, as well as the
Components Library.