Deep Learning for Object Detection Part II - A Deep Dive Into Fast R-CNN is the second article in our Deep Learning for Object Detection series, which explores state-of-the-art, region based, object detection methods and their evolution over time. In this piece, we look at innovations that improve training and testing speed to overcome many of the drawbacks of the R-CNN approach.
In Part I, we saw how R-CNN, pioneered by Ross Girshick and his team, sparked new research interest in region based object detection. The following year, they released a new paper detailing how they suped up R-CNN.
We will take a look at how they improved upon R-CNN and what methodologies were employed to develop a deeper understanding of why this method is called Fast R-CNN. We have also provided sample tensorflow code to help aid understanding of how the tensors are flowing between layers and how spatial pyramid pooling works in action.
R-CNN vs Fast R-CNN
The landscape of computer vision, and more specifically object detection, has changed dramatically since Convolutional Neural Networks (CNNs) were introduced in 2013. As you can see in the graph below, precision began to plateau at 40% until the advent of deep learning for object detection started to take off. Advancement in research coupled with accessibility to faster and more powerful hardware resulted in precision growing exponentially and current state of the art is around 80%.
There was a meteoric rise in precision after CNNs were used for object detection. Image Source