NLP Resources

Here are some links to resources about the core concepts of Natural Language Processing (NLP) that will help you get started with Haystack.

What is NLP?

Learn about what is possible when we apply computational power to language processing.

Title	Type	Author	Description	Level
Natural Language Processing (NLP)	Blog	IBM	High level introduction to the tasks, tools, and use cases of NLP.	Beginner
Introduction to NLP	Video	Data Science Dojo	Covers many of the different tasks from part-of-speech tagging to the creation of word embeddings. Contains some probabilistic notation.	Intermediate
Text Classification with NLP: Tf-Idf vs Word2Vec vs BERT	Blog with Code	Mauro Di Pietro	Hands-on and in depth dive into text classification using TF-IDF, Word2Vec and BERT.	Intermediate

Search and Question Answering

There are many different flavors of search. Learn the differences between them and understand how the task of question answering can improve the search experience.

Title	Type	Author	Description	Level
Question Answering at Scale With Haystack	Blog	Branden Chan (deepset)	High level description of the Retriever-Reader pipeline that gives some intuition about how it works, how it can be deployed.	Beginner
Understanding Semantic Search	Blog	Branden Chan (deepset)	Disambiguates search jargon and explains the differences between various styles of search.	Beginner
Haystack: The State of Search in 2021	Blog	Branden Chan (deepset)	Description of the Retriever-Reader pipeline and an introduction to some complementary tasks.	Beginner
Modern Question Answering Systems Explained	Blog	Branden Chan (deepset)	Illustrated deeper dive into the inner workings of the Reader model.	Beginner
How to Build an Open-Domain Question Answering System?	Blog	Lilian Weng	Comprehensive look into the inner workings of a Question Answering system. Contains a lot of mathematical notation.	Advanced

Text Vectorization and Embeddings

In NLP, text is often converted into a sequence of numbers called an embedding. Learn how they are generated and why they are useful.

Title	Type	Author	Description	Level
What Is Text Vectorization? Everything You Need to Know	Blog	Branden Chan (deepset)	High-level overview of text vectorization starting from TF-IDF to Transformers.	Beginner
Word Embeddings for NLP	Blog	Renu Khandelwal	Gives good intuition of what word embeddings are and how we use them. Contains some helpful illustrations.	Intermediate
Introduction to Word Embedding and Word2Vec	Blog	Dhruvil Karani	A deeper dive into the CBOW and Skip Gram versions of Word2Vec.	Advanced

BERT and Transformers

The majority of the latest NLP systems use a machine learning architecture called the Transformer. BERT is one of the first models of this kind. Learn why these were so revolutionary and how they work.

Title	Type	Author	Description	Level
From Language Model to Haystack Reader	Documentation	deepset	High level overview of how language models, Readers and prediction heads are all related	Beginner
Intuitive Explanation of BERT- Bidirectional Transformers for NLP	Blog	Renu Khandelwal	Touches upon many of the concepts that are essential to understanding how Transformers work.	Beginner
A dummy’s guide to BERT	Blog	Nicole Nair	A good high-level summary of the BERT paper.	Beginner
Learn About Transformers: A Recipe	Blog	Elvis Saravia	Links to many other resources that give explanations or implementations of the Transformer architecture.	Intermediate
The Illustrated Transformer	Blog	Jay Alammar	Excellent visualization of the inner workings of transformer models. Gets quite deep into details.	Advanced
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)	Blog	Jay Alammar	Excellent visualization of the inner workings of language models. Gets quite deep into details.	Advanced