How Does Your Video Annotation System Stack Up?
When we released our new video annotation (VA) solution last fall (available via Alegion Control and within our managed platform), we decided to run a couple of tests to see how its efficiency and performance measured up to the leading open source and commercial offerings.
In addition to enhanced support for long videos with dense annotations, Alegion’s latest VA system features an entirely new user experience, refined over two years of work with complex, large scale video annotation projects.
We were particularly interested in assessing how well this new user experience reduced task time and increased annotation quality. We chose a couple of key competitors, built a moderately complex use case, and conducted some unscientific but very enlightening time trials.
Here’s how we did it, and what we learned:
What We Tested
In addition to Alegion Video Annotation, we tested the open source Computer Vision Annotation Tool (CVAT) and another well-known tier 1 commercial labeling platform (let's call them ‘Brand X’ for short). All three are self-service systems that allow a team to quickly upload assets and set up an ontology.
The Use Case
When we selected the video and defined the annotation guidelines, we made choices that modeled some of our real world projects, but that avoided bias toward any of the products. Because we were primarily interested in worker efficiency and time on task, rather than scalability, we chose to keep the clip short in length, but long enough to provide a good test of efficiency and playback performance.
We had annotators already familiar with each annotation system complete 200 frames of a 4K video. Appropriately, the video is a clip of a sparring match between boxers.
What We Tested
While we could have post processed these relationships based on bounding box containment, this is not possible in many use cases. We wanted to include this important reference requirement as we see this request in many of our projects.
Time on task was our primary metric, and we also focused on quality assurance (QA) efficiency. Effort to verify labeling quality is a part of every Alegion project, but it can be a distinctly different experience from annotation.
The QA process was kept simple: take the time to verify the localization quality, ensure the accuracy of any classifications, and ensure the keypoints were associated with the correct boxer. No corrections were made in QA.
What We Learned
The difference in time on task for workers between Alegion and the competition was substantial. Annotator notes recorded after each annotation session confirmed some key differences that drove efficiency gains in Alegion’s favor.
Average Annotation Time
Average Quality Assurance (QA) Time
In addition to the timing results, we used annotator interviews to gain further insight into the tooling capabilities that made the difference between Alegion’s VA solution and competitors.
A User Experience Purpose Built For Video
We found the largest time savings and reduction in fatigue came through reductions in clicks, mouse travel, and context switching. As the number of frames and annotations increases, small UX inefficiencies add up - particularly with cases that require dense annotation.
Some key drivers of efficiency are obvious. Well designed shortcut keys and viewing options are essential, and each product has its strengths in this area.
However, only Alegion has a user experience designed from the ground up for working with multiple entities over time. For example, the Alegion timeline view is the only one designed for working with multiple entities simultaneously. The ability to browse and edit multiple entities and their keyframes in one view, drastically reduces the number of clicks and mouse travel. Context switching and cognitive load are lessened when relationships between entities are always in view.
Keyframe and context-sensitive classification tools allow annotators to perform most tasks inside the annotation window. Some of these capabilities delivered the greatest benefits in QA where verifying proper classification and associations between entities is the bulk of the work.
A Real World Classification Model
Associations between labeled entities is a common requirement of a wide variety of use cases. Sports, security, and retail commonly feature associations between persons, body parts, and objects like shopping items. In addition, the number of these relationships can be open ended in complex use cases.
Of the three platforms tested, Alegion was the only system with a built-in model for associations. Hierarchical relationships like skeletal points are also built into the Alegion user interface so that they can be defined and verified according to their logical structure. In the competitive systems we tested, we had to define arbitrary ID fields as lookups between entities. These felt like workarounds, but mostly they were error prone and time consuming because they required manual entry and updates across the frames of video. This inefficiency was compounded in QA when these field values had to be rechecked.
Performant Playback and Annotation Synchronization
Even with the modest number of frames in this test, Alegion’s ability to smoothly stream 4K video and keep dense annotations in sync was a key factor that increased worker efficiency. In the annotation task and in QA, the precise application of classifications and localization quality is much faster to verify. With the use of the timeline scrubber and smooth playback, annotators can view localizations with speed and precision without compensating for lag time inaccuracy.
Most video annotation systems are derivative of image annotation tools, and this leads to inefficiency and worker fatigue as annotation density and scale increases.
When we designed the Alegion Video Annotation system, we treated video as a first class data type. Video is a domain with a unique set of challenges, but also some well-understood and proven solutions. Using a system built specifically for video, combined with a rich classification model, gives workers a solution that excels as the complexity and scale of a video annotation project increases.
The Alegion video annotation system delivers high quality labeled data, reduces worker fatigue, and saves teams time and money.