Object detection is a cornerstone of computer vision. It is connected to both image recognition and image segmentation. Where image recognition outputs a classification label for an identified object and image segmentation creates a pixel level understanding of objects in the scene, object detection locates objects within images or videos, allowing them to be tracked and counted. This allows for many of the most popular ML and AI applications such as face detection, autonomous vehicles, video surveillance, and anomaly detection.
One of the most popular object detection methods is the R-CNN series, developed by Ross Girshick et al in 2014, improved upon with Fast R-CNN and then finally with Faster R-CNN. The differentiating approach that makes Faster R-CNN better and faster is the introduction of Region Proposal Network (RPN). RPN is a fully convolutional network, trained end-to-end, that simultaneously predicts object boundaries and object scores at each detection. With RPN being so important to Faster R-CNN, which continues to be one of the best object detection frameworks available to researchers, the bulk of this piece will focus on the RPN design and the concept of anchor boxes and non-maximum suppression.