In this piece, one of our in-house research scientists, Cameron Wolfe, provides an overview of online learning techniques, focusing on those that are most effective for the AI/ML practitioner.
Online learning — a popular research area within the deep learning community — has wide applications in the industrial setting. Scenarios in which data becomes sequentially available to a learner are very common, including dynamic e-commerce recommendations, on-device learning, or federated learning as examples where the full dataset might not be available at the same time. We will tackle the topic of online learning from the viewpoint of a practitioner, answering questions such as:
- What are problems faced by online learning models?
- What are go-to solutions for training a model online?
- What level of performance should I expect from a model that is trained online?
Our objective is to make practitioners aware of the options that exist within the online learning space, providing a viable solution to the scenario in which a model must learn from new data that is constantly becoming available. Not only does such a training setup eliminate latency-ridden, offline re-training procedures (i.e., the model is updated in real time), but it more closely reflects how intelligent systems learn in the world around us — when a human learns a skill, they do not require several hours of GPU training to leverage their new knowledge!
See the complete white paper: Online Learning Techniques: An Overview.
What Is Online Learning?
Let’s define online learning as a training scenario in which the full dataset is never available to the model at the same time. Rather, the model is exposed to the portions of the dataset sequentially and expected to learn the full training task through such partial exposures. Typically, after being exposed to a certain portion of the dataset, the model is not allowed to re-visit this data later. Otherwise, the model could simply loop over the dataset and perform a normal training procedure.
Does the setup matter?
Given that so many different experimental scenarios exist for the study of online learning techniques, the choice of experimental setup is important. For example, models trained for incremental learning scenarios rarely perform well in the streaming setting so it’s important to be specific about the exact learning scenario. Fortunately, many of the training techniques used for all types of online learning are very similar and only require slight modifications to make them more impactful in a given setting.