Annotation Tool

Generate labels for question answering and document retrieval with ease using deepset's Annotation Tool. You can currently only use the local version of the tool.

Main Features

  • Create labels with different techniques: Come up with questions (+ answers) while reading passages (SQuAD style) or have a set of predefined questions and look for answers in the document (~ Natural Questions).
  • Structure your work through organizations, projects, users.
  • Upload your documents or import labels from a list of predefined questions.
  • Export your labels in SQuAD Format.
A screenshot of the Annotation Tool interface for adding question-answer pairs.

The Annotation Tool is available online, or you can install it locally.

Installing the Local Version (Docker)

  1. Configure the credentials and the database in the docker-compose.yml. The credentials should match in the database image and application configuration.
DEFAULT_ADMIN_EMAIL: "[email protected]"
DEFAULT_ADMIN_PASSWORD: "DEMO-PASSWORD"

DB_HOSTNAME: "db"
DB_NAME: "databasename"
DB_USERNAME: "somesafeuser"
DB_PASSWORD: "somesafepassword"

POSTGRES_USER: "somesafeuser"
POSTGRES_PASSWORD: "somesafepassword"
POSTGRES_DB: "databasename"

COOKIE_KEYS: "somesafecookiekeys"
JWT_SECRET: "somesafesecret"
  1. Run docker-compose: docker-compose up. You should be able to access the UI at localhost:7001.

User Guide

There is a User Guide for an earlier version of the Annotation Tool. While it doesn't include all the latest features, the basic workflow, and tips for label quality are still the same.

Annotation FAQ

What is a good question?

  • A good question is a fact-seeking question that can be answered with an entity (person, organization, location, etc.) or explanation. A bad question is ambiguous, incomprehensible, dependent on clear false presuppositions, opinion seeking, or not clearly a request for factual information.
  • The question should ask about information present in the text passage given. It should not require additional information or interpretation.
  • Do not copy-paste answer text into the question. Good questions do not contain the exact same words as the answer or the context around the answer. The question should be a reformulation with synonyms and in a different order as the context of the answer.
  • Questions should be very precise natural questions you would ask when you want information from another person.

How many questions should you ask per text passage?

  • Maximally ask 20 questions per passage
  • Some text passages are not suited for 20 questions. Don't make up very constructed and complicated questions just to fill the 20 - move on to the next text.
  • Try to ask questions covering the whole passage and focus on questions covering important information. Do not only ask questions about a single sentence in that passage.

What is a good answer span?

  • Always mark whole words. Do not start or end the answer within a word.
  • For short answers: The answer should be as short and as close to a spoken human answer as possible. Do not include punctuation.
  • For long answers: Please mark whole sentences with punctuation. The sentences can also pick up parts of the question, or mark whole text passages. Mark passages only if they are not too large (e.g. not more than 8-10 sentences).

How do I differentiate long vs short answers?

  • If there is a short answer possible you should always select a short answer over a long answer.
  • Short precise answers like numbers or a few words are short answers.
  • Long answers include lists of possibilities, or multiple sentences are needed to answer the question correctly.

How to handle multiple possible answers to a single question?

  • As of now, there is no functionality to mark multiple answers per single question.
  • Workaround: You can add a question with the same text but a different answer selection by using the Custom Question button below the question list.

What to do with grammatically wrong or incorrectly spelled questions?

  • Include them. When users use the tool and ask questions they will likely contain grammar and spelling errors too.
  • Exception: The question needs to be understandable without reading and interpreting the corresponding text passage. If you do not understand the question, please mark the question as β€œI don’t understand the question”.

What to do with text passages that are not properly converted or contain (in part) information that cannot be labeled (for example, just lists or garbage text)?

  • Please do not annotate this text
  • You can write down what is missing, or the cause why you cannot label the text + the text number and title.

Which browser to use?

  • Please use the Chrome browser. The tool is not tested for other browsers.