Eliminating the burden of annotation: Taking humans out of the equation
The most ingenious, well-designed ML models are only as good as the annotated data used to train them. But while image-based data has traditionally relied on humans to identify and label each component contained within each scene, the burden of annotation is being eroded through the increased use of synthetic data.
By its simplest definition, Machine Learning is about teaching an algorithm to draw conclusions by comparing what it sees to what it knows. But teaching a computer is not like teaching a human. The algorithm training process must be compulsively systematic and precise in order to teach a machine effectively.
The algorithm’s source of “what it knows” comes, of course, from the training data it is fed. And the algorithm’s success or failure depends on three key drivers of that data — quantity, quality and scope. Annotation, an initial stage of the ML training process, relies on all of these three factors. In order to teach the algorithm to accurately identify an object of interest (be it a rusty pipe or a ship), a human subject-matter specialist has to work through a massive body of imagery, manually tagging every element. This annotation becomes critical metadata that is attached to each image to identify its significance. To meet our three requirements, that annotation needs to be extensive (quantity), accurate (quality), and cover a broad enough list of variables (scope). And therein lies the bottleneck, the financial overhead, and even, for some companies, the showstopper. This is because, in a rapidly growing market, ML-based companies simply cannot afford these problems and their potential to derail the success of their algorithm training process.
The Key Problems with Manual Annotation
Cost in time and money — Annotation is labor-intensive and therefore expensive. For instance, in urban aerial photography, one hi-res image can contain hundreds (or even thousands) of items to annotate. For each individual object, it may be necessary to first indicate its presence, name it, and then add its location, count, size, shape and color, etc. When this initial stage relies on human involvement it is simply not scalable as the amount of imagery and items to annotate rapidly grows. This factor also has major effects on the annotation sophistication — drawing a simple “Bounding box” around an object is much quicker (and therefore way cheaper) than manually generating segmentation maps (not even mentioning other forms that a human labeler can’t grasp, e.g. depth maps) of that same object. When settling on less advanced annotations, we are also settling on performance and types of supported applications.
Accuracy — Human annotation is prone to mistakes and inconsistency. This is especially true when, as often is the case, the work is farmed out to an external supplier. Using such outsourced workforces is no guarantee of expertise. In fact, in most cases it can be the exact opposite, with these suppliers hiring the cheapest labor possible. It is therefore problematic to rely on their annotations as accurate. Even with highly trained in-house annotators, eventually the repetitive work triggers fatigue and boredom. The result is attention to detail wanes. These realities often mean that developers must add an additional step of reviewing and standardizing annotations into normalized categories. In addition to that, there is the methodology of comparing answers to ensure accuracy and consistency — so every annotation won’t be done just once, but three or even more times. Some companies try to use in-house employees who are more in touch with the importance of consistency, but they will still make mistakes, lack 100% consistency in their annotation and will have their other tasks put on the backburner.
The business impact of making a mistake — A poorly functioning algorithm is not like an underpowered car (annoyingly slow or “less useful”). It’s often a source of measurable loss when used for making decisions. Flawed annotation can lead to an algorithm failing to spot a weak point in a bridge, or corroded pipelines. This can trigger significant costs and even result in disasters.
Annotation is also highly specialized. It requires specialized tools, techniques, terminologies, training, and in a lot of the cases, subject matter experts. Even with all these in place, the process is still cumbersome and inconsistent.
It’s no surprise, then, that most organizations struggling with AI and ML projects complain that their biggest problems concern data quality, data labeling, and building model confidence.
The Power of Automatic Annotation
Synthetic data solves the problems associated with manual annotation. It specifically eliminates human costs, inaccuracy, and at the same time, accelerates time to market.
When creating synthetic imagery, developers actually include the annotation process as an integral, automatic, inseparable part of the creation process. For instance, an overhead shot of a 20-year-old green vehicle an hour before sundown can be automatically labeled with those and associated facts as the image is created. With no humans involved, the annotation is 100% consistent and accurate.
With this approach the challenge of scalability, such as number of images, randomness and variations, fades away. In seconds, we can create a thousand permutations of that vehicle as we mix and match sensor angles, lighting, paint color, materials, rain, snow and any other variables required. The real beauty of synthetic data is that each one of the thousands of versions of this image that emerge from the process comes fully annotated.
The annotation stage of ML training cannot be replaced – not, at least, until so much of the world has been processed and cataloged that we’ll have computers serving as “virtual content experts,” annotating data for other computers to train on. But as AI takes center stage in so many industries, the benefits of synthetic data and its ability to remove annotation as a burden in the algorithm training process is unsurprisingly becoming a very highly prized asset.