Comparing CNN based object detectors

In Google Research’s paper Speed/accuracy trade-offs for modern convolutional object detectors, an experimental comparison is performed for the three popular object detectors (Faster R-CNN, R-FCN and SSD). Here I want to take note of some important results from this paper. Models details and configurations can be found in the paper.

Accuracy vs time

In general, R-FCN and SSD are faster than Faster R-CNN. Among fastest models, SSD with Inception v2 and Mbilenet are most accurate. Faster R-CNN with Inception Resnet attains the best possible accuracy.

The effect of the feature extractor

Generally, with better classification accuracy, the detector (Faster R-CNN, R-FCN) can have better detection ability. However, the performance of SSD appears to be less reliant on its feature extractor’s classification accuracy.

The effect of object size

All methods do better on large objects. SSD models are typically poor on small object. But SSD can outperform Faster R-CNN and R-FCN on large objects.

The effect of image size

The effect of the number of proposals

For Faster R-CNN and R-FCN, we can adjust the number of proposals. For Faster R-CNN, reducing number of proposals can lead to significant computational savings. According to the figure above, the sweet spot is at 50 proposals. For R-FCN, the savings are minimal.

FLOPs analysis

For denser block models such as Resnet 101, FLOPs/GPU time is typically greater than 1. For Inception and Mobilenet models, this ratio is typically less than 1

Memory analysis

IOU thresholds

Detectors that have poor performance at the higher IOU thresholds always also show poor performance at the lower IOU thresholds.