3D-MOOD / vis4d /zoo /shift /README.md
RoyYang0714's picture
feat: Try to build everything locally.
9b33fca

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

SHIFT Model Zoo

We provide various models trained using Vis4D on the SHIFT dataset.

Semantic Segmentation

The semantic segmentation task involves predicting a segmentation mask for each image indicating a class label for every pixel. The SHIFT dataset contains fine-grained semantic segmentation annotations from different domains. For details about downloading the data and the annotation format for this task, see the official documentation.

Semantic FPN

Panoptic Feature Pyramid Networks [CVPR 2019]

Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

Abstract The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

Results

Clear-daytime domain SHIFT has 20k images for training and 3k images for validation under the clear-daytime domain. Below are the results of models trained and tested in this domain.

Base network Iters Train crop size mIoU-val Scores-val Config Weights Preds Log
R-50-FPN 40K 512 * 1024 80.71 scores config model pred log
R-50-FPN 160K 512 * 1024 85.28 scores config model pred log

All domains SHIFT has 150k frames for training and 25k frames for validation for all domains. Below are the results of models trained and tested in all domains.

Base network Iters Train crop size mIoU-val Scores-val Config Weights Preds Log
R-50-FPN 40K 512 * 1024 74.22 scores config model pred log
R-50-FPN 160K 512 * 1024 78.87 scores config model pred log

Object Detection

The object detection task involves localization (predicting a bounding box for each object) and classification (predicting the object category). The SHIFT dataset contains fine-grained object detection annotations from different domains. For details about downloading the data and the annotation format for this task, see the official documentation.

Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [NeurIPS 2015]

Authors: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

Abstract State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

Results

Clear-daytime domain SHIFT has 20k images for training and 3k images for validation under the clear-daytime domain. Below are the results of models trained and tested in this domain.

Base network Lr schd Box AP-val Scores-val Config Weights Pred Log
R-50-FPN 12e 45.7 scores config model pred log
R-50-FPN 36e 46.0 scores config model pred log

All domains SHIFT has 150k frames for training and 25k frames for validation for all domains. Below are the results of models trained and tested in all domains.

Base network Lr schd Box AP-val Scores-val Config Weights Pred Log
R-50-FPN 6e 49.6 scores config model pred log