June has been a wild, all-hands-on-deck kind of month! We went out into the field to converse with customers and research scientists on the cutting edge of AI and ML as sponsors of 5 conferences: CVPR, International Conference on Machine Learning, Energy Drone & Robotics Summit, O'Reilly AI Conference in Beijing, and AI World Government.
At all of these venues we saw examples of both unsupervised and supervised machine learning (ML) models. Both, of course, demand lots of data. But in broad terms unsupervised models learn by identifying characteristics of a data set, while supervised models learn by example and answer predefined questions about each data point.
Unsupervised learning offers the promise of harnessing the power of vast amounts of data without requiring external data labeling. This method has proven to be great for exploratory objectives, for example clustering for customer segmentation or recommendation systems.
The majority of the ML models we saw in development, however, rely on supervised learning. This learning-by-example approach answers prescribed questions about data points in a data set. The more prescriptive the use case, the better the fit for supervised learning. For example, an ML system that identifies wind turbine damage from drone video footage. The model learns from these videos to assess whether damage is present by seeing numerous examples, enabling better management of safety, cost, and logistics for repairs.
Supervised learning requires ample, high quality labeled data, through which the model is trained. Data needs increase very rapidly with greater use case specificity or higher confidence levels. The concept “garbage-in, garbage-out” applies to machine learning systems, too, and system performance and training data quality are inextricably linked. Overall performance of even the most sophisticated model can be easily compromised if it is trained on data that is poorly labeled or does not accurately reflect the target values.
Data Labeling Techniques
There are several data labeling techniques including:
- In-house labeling
- Data Programming
- Synthetic Labeling
- Data Labeling Platforms
Attendees of all 5 conferences expressed facing challenges when determining the most suitable technique. Each has pros and cons, which we detail in our latest white paper, Supervised vs. unsupervised learning, a “how to” for choosing the right approach and data labeling technique for your ML projects.