The Art of Annotation: How to design effective guides for computer vision annotation tasks

Laura Chrobak
May 13
3 min read

Updated: May 19

By Laura Chrobak and Filippo Varini

Curating annotations is key to building supervised computer vision models, but it’s not without challenges. Two critical sticking points often arise: clearly defining the objective of the task; and ensuring all annotators understand and follow a consistent annotation protocol.

Hi there! I’m Laura Chrobak, an MLOps research engineer at MBARI working on the FathomNet program. I’ve managed large-scale annotation tasks with international teams to create datasets for ocean-related machine learning models. My colleague Filippo, a machine learning enigneer at OceanX, develops computer vision projects to classify marine species, which require careful annotation. We have learned that clear guidelines are essential for top-quality annotations. In this guide, we’ll share our tips for creating effective annotation guidelines to improve dataset quality and ultimately boost performance in your own machine learning applications.

Creating a user guide can streamline the process, align objectives, and establish consistency among annotators. Here's a step-by-step breakdown of how to design an effective annotation guide. Remember that every model should have a clear goal and thus a distinctive user guide.

1. Best Practices for Clarity

Keep it Simple: Use clear, concise language.
Rely on Visuals: Green checkmarks for "do this" and red Xs for "avoid this" are universally understood.

2. Structuring the User Guide

For annotation tasks, we recommend creating a slide deck with the following structure:

a. Project Goals

Provide context and purpose for the annotation task. Depending on the audience, this section may include:

The overarching goal of the project,
Specific details about the task,
The timeline for completion.

b. Annotation Guidelines

Clearly define annotation rules for those involved in your project. It is helpful to break them down into "do's" and "don'ts" for clarity. Typical topics to address include:

Occlusion
- Define how to handle objects that are partially out of frame or occluded.
Accuracy
- Specify the precision required for annotations. For instance, should bounding boxes tightly hug objects or is a looser fit acceptable? Balancing precision and labeling speed is often critical for the annotation task.
Distinctness
- Clarify how to handle blurry or distant objects. For example:
  - Label to the lowest taxonomic level you’re confident in.
  - If unsure, use a generic label like “marine organism.”
  - Allow annotators to use context from the full image (e.g., location or surrounding objects) for classification.
- Clarify how to label objects when difficult to distinguish between individual organisms (i.e. a sponge aggregation).
Number of Annotations
- Set expectations for how many objects to label in an image. In crowded scenes, it may be impractical to label every individual distinctly. For example:
  - If counting individuals is essential, focus on accurate bounding boxes.
  - For general classification tasks, prioritize differentiating objects from the background.
  - Flagging images with excessive annotations can also help estimate task completion time.
Provide examples
- Include visual examples for ambiguous cases in the visual examples section. These are often more effective than written rules.

c. Class Descriptions

For classification tasks, include:

Positive examples for each class.
Examples of common misclassifications to avoid.

d. Visual Examples

Use annotated images to demonstrate:

Correct applications of rules (marked with green checkmarks).
Common mistakes to avoid (marked with red Xs).
Exemplars of each concept for classification tasks alongside confusing look a-likes Visual aids are particularly effective for multilingual teams.

3. Encouraging Annotator Feedback

Encourage annotators to document unclear cases by adding slides to your user guide with examples or images they are confused about. This allows the team to collectively decide on the best approach and refine the guide as needed.

4. Programmatic Checks for Quality Assurance

Beyond setting clear expectations, programmatic checks can ensure annotators adhere to the guidelines. Consider implementing:

Golden Images: Pre-annotated images used to test annotator accuracy.
- Golden dataset, can be used to evaluate model performance as well as to evaluate rater performance during the annotation process.
- For real world examples,see Scale AIs “golden dataset” tool
Consensus Protocols: Automated checks to compare annotator outputs and identify discrepancies.

For more detail on gold data and annotator agreement I found this paper, “Analyzing Dataset Annotation Quality Management in the Wild,” to be particularly helpful!

5. General Advice

Perform a few annotations yourself to estimate the time required for the task.
For organisms that are hard to distinguish at the individual level (e.g., amorphous shapes), it's acceptable to annotate multiple individuals within the same bounding box, especially if the model’s goal is not to strictly count individuals.

Continuously refine your user guide based on annotator feedback and observed inconsistencies. It’s important to explicitly treat these user guides as living documents that will continue to be improved throughout the annotation task. By creating a clear and comprehensive user guide, you can save time, reduce confusion, and ensure high-quality annotations for your computer vision models. Happy labeling!

If you have questions or more suggestions for improving the art of annotation, leave a comment below!