RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Abstract
Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: integrating <PRE_TAG>instance mask prediction</POST_TAG> for the first time, making the loss function adaptive and more stable, and including additional <PRE_TAG>hard examples</POST_TAG> in training. We call the resulting augmented network <PRE_TAG>RetinaMask</POST_TAG>. The detection component of <PRE_TAG>RetinaMask</POST_TAG> has the same computational cost as the original RetinaNet, but is more accurate. COCO test-dev results are up to 41.4 mAP for <PRE_TAG>RetinaMask</POST_TAG>-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation. Adding Group Normalization increases the performance of <PRE_TAG>RetinaMask</POST_TAG>-101 to 41.7 mAP. Code is at:https://github.com/chengyangfu/retinamask
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper