NLP Resources

Here are some links to resources about the core concepts of Natural Language Processing (NLP) that will help you get started with Haystack.

What is NLP?

Learn about what is possible when we apply computational power to language processing.

Title Type Author Description Level
Natural Language Processing (NLP) Blog IBM High level introduction to the tasks, tools, and use cases of NLP. Beginner
Introduction to NLP Video Data Science Dojo Covers many of the different tasks from part-of-speech tagging to the creation of word embeddings. Contains some probabilistic notation. Intermediate
Text Classification with NLP: Tf-Idf vs Word2Vec vs BERT Blog with Code Mauro Di Pietro Hands-on and in depth dive into text classification using TF-IDF, Word2Vec and BERT. Intermediate

Search and Question Answering

There are many different flavors of search. Learn the differences between them and understand how the task of question answering can improve the search experience.

Title Type Author Description Level
Question Answering at Scale With Haystack Blog Branden Chan (deepset) High level description of the Retriever-Reader pipeline that gives some intuition about how it works, how it can be deployed. Beginner
Understanding Semantic Search Blog Branden Chan (deepset) Disambiguates search jargon and explains the differences between various styles of search. Beginner
Haystack: The State of Search in 2021 Blog Branden Chan (deepset) Description of the Retriever-Reader pipeline and an introduction to some complementary tasks. Beginner
Modern Question Answering Systems Explained Blog Branden Chan (deepset) Illustrated deeper dive into the inner workings of the Reader model. Beginner
How to Build an Open-Domain Question Answering System? Blog Lilian Weng Comprehensive look into the inner workings of a Question Answering system. Contains a lot of mathematical notation. Advanced

Text Vectorization and Embeddings

In NLP, text is often converted into a sequence of numbers called an embedding. Learn how they are generated and why they are useful.

Title Type Author Description Level
What Is Text Vectorization? Everything You Need to Know Blog Branden Chan (deepset) High-level overview of text vectorization starting from TF-IDF to Transformers. Beginner
Word Embeddings for NLP Blog Renu Khandelwal Gives good intuition of what word embeddings are and how we use them. Contains some helpful illustrations. Intermediate
Introduction to Word Embedding and Word2Vec Blog Dhruvil Karani A deeper dive into the CBOW and Skip Gram versions of Word2Vec. Advanced

BERT and Transformers

The majority of the latest NLP systems use a machine learning architecture called the Transformer. BERT is one of the first models of this kind. Learn why these were so revolutionary and how they work.

Title Type Author Description Level
From Language Model to Haystack Reader Documentation deepset High level overview of how language models, Readers and prediction heads are all related Beginner
Intuitive Explanation of BERT- Bidirectional Transformers for NLP Blog Renu Khandelwal Touches upon many of the concepts that are essential to understanding how Transformers work. Beginner
A dummy’s guide to BERT Blog Nicole Nair A good high-level summary of the BERT paper. Beginner
Learn About Transformers: A Recipe Blog Elvis Saravia Links to many other resources that give explanations or implementations of the Transformer architecture. Intermediate
The Illustrated Transformer Blog Jay Alammar Excellent visualization of the inner workings of transformer models. Gets quite deep into details. Advanced
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) Blog Jay Alammar Excellent visualization of the inner workings of language models. Gets quite deep into details. Advanced