๐Ÿ“… Join the Haystack AMA session for "Building AI Applications with Haystack" on DeepLearning.AI

Advent of Haystack

Try out Haystack 2.0-Beta to discover what’s coming in the next major release

with 10 challenges in the month of December ๐ŸŽ‰

From January 5th to 19th, you will also find the solutions of each challenge.

image

The Haystack elves live in the forest. Every year, after winter, Elf Bilge writes a detailed report on their winter preparations, food collection, memorable moments, and the lessons learned. Other curious elves seek her guidance yearly, asking questions like “Which foods should we collect?” or “What should we do against water scarcity?” ๐ŸŒฒ

This year, Elf Bilge has this idea: make a generative system that replaces her so elves can shoot questions and get elf-style answers. As she plays with LLMs, she realizes these winter reports are too big to just throw at LLMs. Also, not every part of the report usually fits with questions. Being a Haystack elf, she knows how to solve this issue: PREPROCESSING! ๐Ÿ’ก

So, she comes up with a plan. Elf Bilge will convert all report files into Haystack Documents, break them into smaller bits, create semantic doodads ( embeddings), and toss them into a document store. That way, she can later use these docs in her RAG pipeline for their generative system. ๐ŸŒŸ

For this challenge, you must help Elf Bilge create a pipeline to preprocess documents and index them to the document store with their embeddings.

๐ŸŽฏ Requirements:

  • Each split should have 200 words, and the overlap size should be 50 words.
  • Use all winter reports (winter_report_one.txt, winter_report_two.pdf, winter_report_three.md)

๐Ÿงก Some Hints:

  • Use FileTypeRouter to route files to the correct converters
  • Use DocumentJoiner to join documents from multiple converters into one list of documents.
  • You have seen how to connect components in Day 1.

๐Ÿ’š Here is the Starter Colab