๐ŸŽ„ Let's code and celebrate this holiday season with Advent of Haystack

Integration: fastRAG

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines

Authors
Intel Labs

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

Comments, suggestions, issues and pull-requests are welcomed! โค๏ธ

IMPORTANT

Now compatible with Haystack v2+. Please report any possible issues you find.

๐Ÿ“ฃ Updates

  • 2024-05: fastRAG V3 is Haystack 2.0 compatible ๐Ÿ”ฅ
  • 2023-12: Gaudi2 and ONNX runtime support; Optimized Embedding models; Multi-modality and Chat demos; REPLUG text generation.
  • 2023-06: ColBERT index modification: adding/removing documents.
  • 2023-05: RAG with LLM and dynamic prompt synthesis example.
  • 2023-04: Qdrant DocumentStore support.

Key Features

  • Optimized RAG: Build RAG pipelines with SOTA efficient components for greater compute efficiency.
  • Optimized for Intel Hardware: Leverage Intel extensions for PyTorch (IPEX), ๐Ÿค— Optimum Intel and ๐Ÿค— Optimum-Habana for running as optimal as possible on Intelยฎ Xeonยฎ Processors and Intelยฎ Gaudiยฎ AI accelerators.
  • Customizable: fastRAG is built using Haystack and HuggingFace. All of fastRAG’s components are 100% Haystack compatible.

๐Ÿš€ Components

For a brief overview of the various unique components in fastRAG refer to the Components Overview page.

LLM Backends
Intel Gaudi Accelerators Running LLMs on Gaudi 2
ONNX Runtime Running LLMs with optimized ONNX-runtime
OpenVINO Running quantized LLMs using OpenVINO
Llama-CPP Running RAG Pipelines with LLMs on a Llama CPP backend
Optimized Components
Embedders Optimized int8 bi-encoders
Rankers Optimized/sparse cross-encoders
RAG-efficient Components
ColBERT Token-based late interaction
Fusion-in-Decoder (FiD) Generative multi-document encoder-decoder
REPLUG Improved multi-document decoder
PLAID Incredibly efficient indexing engine

๐Ÿ“ Installation

Preliminary requirements:

  • Python 3.8 or higher.
  • PyTorch 2.0 or higher.

To set up the software, clone the project and run the following, preferably in a newly created virtual environment:

pip install fastrag

There are additional dependencies that you can install based on your specific usage of fastRAG.

For the example below, we need to install extra packages via the following command:

pip install fastrag[intel, openvino]

Usage

You can import components from fastRAG and use them in a Haystack pipeline:

from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.rankers import TransformersSimilarityRanker

from fastrag.generators.openvino import OpenVINOGenerator

prompt_template = """
Given these documents, answer the question.
Documents:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}
Question: {{query}}
Answer:
"""

openvino_compressed_model_path = "path/to/quantized/model"

generator = OpenVINOGenerator(
    model="microsoft/phi-2",
    compressed_model_dir=openvino_compressed_model_path,
    device_openvino="CPU",
    task="text-generation",
    generation_kwargs={
        "max_new_tokens": 100,
    }
)

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipe.add_component("ranker", TransformersSimilarityRanker())
pipe.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipe.add_component("llm", generator)

pipe.connect("retriever.documents", "ranker.documents")
pipe.connect("ranker", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

query = "Who is the main villan in Lord of the Rings?"
answer_result = pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    },
    "ranker": {
        "query": query,
        "top_k": 1
    }
})

print(answer_result["llm"]["replies"][0])
#' Sauron\n'

For more examples, check out Example Use Cases.

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.