In Google Research’s paper Speed/accuracy trade-offs for modern convolutional object detectors, an experimental comparison is performed for the three popular object detectors (Faster R-CNN, R-FCN and SSD). Here I want to take note of some important results from this paper. Models details and configurations can be found in the paper.
Accuracy vs time
In general, R-FCN and SSD are faster than Faster R-CNN. Among fastest models, SSD with Inception v2 and Mbilenet are most accurate. Faster R-CNN with Inception Resnet attains the best possible accuracy.
The effect of the feature extractor
Generally, with better classification accuracy, the detector (Faster R-CNN, R-FCN) can have better detection ability. However, the performance of SSD appears to be less reliant on its feature extractor’s classification accuracy.
The effect of object size
All methods do better on large objects. SSD models are typically poor on small object. But SSD can outperform Faster R-CNN and R-FCN on large objects.
The effect of image size
The effect of the number of proposals
For Faster R-CNN and R-FCN, we can adjust the number of proposals. For Faster R-CNN, reducing number of proposals can lead to significant computational savings. According to the figure above, the sweet spot is at 50 proposals. For R-FCN, the savings are minimal.
FLOPs analysis
For denser block models such as Resnet 101, FLOPs/GPU time is typically greater than 1. For Inception and Mobilenet models, this ratio is typically less than 1
Memory analysis
IOU thresholds
Detectors that have poor performance at the higher IOU thresholds always also show poor performance at the lower IOU thresholds.