Seeing and then identifying objects is the foundation for all learning and cognition. This is true both in human development and in the development of training data for computer vision models.
In order to explore the pros and cons of quality metrics like pixel tolerance and Intersection Over Union (IOU), it helps to zoom out a bit and ask ourselves: why do we need these two quality metrics in order to get high-quality training data?
The Importance of Precise Object Localization
Your human mind is absolutely extraordinary. Think about the ease and rapidity with which you learned to identify and classify objects:
- You were processing and recognizing familiar faces within days of your birth.
- You were around three months old when you first started to recognize a favorite teddy bear or chew toy.
- By nine months, you could see a picture of an object and make the connection between the representation and the real thing
The purpose of data annotation for computer vision is to teach a model how to identify and classify things. The human mind “annotates” effortlessly - we see something and we identify it (with varying degrees of specificity and accuracy based on prior experience). So far, not that dissimilar from how a model learns.
However, even if we aren’t sure what something is, we don’t have any trouble seeing that there is something in front of us and that the thing has a distinct shape and size that distinguishes it from other objects. Except at distances or in visually chaotic situations, the average human with good eyesight does not struggle to understand the boundaries between one thing in our field of vision and another thing in our field of vision, what we could call precise object localization.
Precise object localization can be more challenging for machine learning models.
How Do We Localize Objects in Data Annotation for Computer Vision?
We put a bounding box around it! If you are picturing the kindergarten workbook where you have to draw a circle around the apple, you’re not far off. However, unlike for the kindergartner, it is crucial that the human annotator or computer vision model draws the box with precision.
The first steps of annotation are always about identifying 1) that an object exists and 2) that the object occupies a discrete space in the frame (e.g. by putting a bounding box around it). Pixel tolerance and IOU are our best tools for measuring how precise our object localization is.
Pixel tolerance and IOU allow us to measure the quality of bounding box placement by measuring the difference between a known correct answer (called an authoritative answer, or sometimes a ground truth or gold answer) and an answer being tested (provided by a human worker or generated by a model).
Now, let’s talk about the differences between these two localization metrics and explain why IOU is a more robust metric.
So, Which Is Better? IOU or Pixel Tolerance?
They are both useful, but we prefer the ratio metric (IOU) to the difference metric (Pixel Tolerance).
Ratio metrics measure difference using proportions, while distance metrics measure difference between ground truth answers and other provided answers using an absolute difference on a specific scale (such as pixels in the frame).
Both ratio and distance metrics are important depending on what you are measuring, but using a ratio metric can provide a higher degree of accuracy across a wider variety of cases because it is scale-invariant and domain agnostic.
Intersection over union is the standard metric used in the machine learning discipline for bounding boxes and other shapes because it applies well to both large and small shapes. It also correlates easily to a distance metric like pixel tolerance. The scale invariance of IOU means that it requires tighter pixel tolerances for small shapes and allows larger pixel tolerances for large shapes.
Curious about Getting High-Quality Training Data for Your ML/CV Project?
This is where we specialize. We have a whole guide that walks through the four prioritized phases to quality training data, explains the significance of specific metrics for quality, and discusses how our Customer Success team partners with you to ensure the quality you need
We’ve established quality management best practices and a reliable pipeline for quality training data based on our experience labeling tens of millions of images, video, text, and audio records alongside our customers.
At Alegion, it’s not just the platform that supports your quality training data needs, it’s our whole team of experts, dedicated to your success.
Every CV team needs high-quality training data; let’s use our expertise to get you there, together.
Reach out to us today and request a demo now to learn how we can help you avoid bias in your training data