Spaces:

RoyYang0714
/

3D-MOOD

Running on Zero

App Files Files Community

3D-MOOD / vis4d /zoo /shift /README.md

RoyYang0714

feat: Try to build everything locally.

9b33fca about 2 months ago

preview code

raw

history blame contribute delete

9.1 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

SHIFT Model Zoo

We provide various models trained using Vis4D on the SHIFT dataset.

Semantic Segmentation

The semantic segmentation task involves predicting a segmentation mask for each image indicating a class label for every pixel. The SHIFT dataset contains fine-grained semantic segmentation annotations from different domains. For details about downloading the data and the annotation format for this task, see the official documentation.

Semantic FPN

Panoptic Feature Pyramid Networks [CVPR 2019]

Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

Abstract

The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

Results

Clear-daytime domain SHIFT has 20k images for training and 3k images for validation under the clear-daytime domain. Below are the results of models trained and tested in this domain.

Base network	Iters	Train crop size	mIoU-val	Scores-val	Config	Weights	Preds	Log
R-50-FPN	40K	512 * 1024	80.71	scores	config	model	pred	log
R-50-FPN	160K	512 * 1024	85.28	scores	config	model	pred	log

All domains SHIFT has 150k frames for training and 25k frames for validation for all domains. Below are the results of models trained and tested in all domains.

Base network	Iters	Train crop size	mIoU-val	Scores-val	Config	Weights	Preds	Log
R-50-FPN	40K	512 * 1024	74.22	scores	config	model	pred	log
R-50-FPN	160K	512 * 1024	78.87	scores	config	model	pred	log

Object Detection

The object detection task involves localization (predicting a bounding box for each object) and classification (predicting the object category). The SHIFT dataset contains fine-grained object detection annotations from different domains. For details about downloading the data and the annotation format for this task, see the official documentation.

Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [NeurIPS 2015]

Authors: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

Results

Clear-daytime domain SHIFT has 20k images for training and 3k images for validation under the clear-daytime domain. Below are the results of models trained and tested in this domain.

Base network	Lr schd	Box AP-val	Scores-val	Config	Weights	Pred	Log
R-50-FPN	12e	45.7	scores	config	model	pred	log
R-50-FPN	36e	46.0	scores	config	model	pred	log

All domains SHIFT has 150k frames for training and 25k frames for validation for all domains. Below are the results of models trained and tested in all domains.

Base network	Lr schd	Box AP-val	Scores-val	Config	Weights	Pred	Log
R-50-FPN	6e	49.6	scores	config	model	pred	log