# RIS-DMMI This repository provides the PyTorch implementation of DMMI in the following papers:
__Beyond One-to-One: Rethinking the Referring Image Segmentation (ICCV2023)__
# News * 2023.10.03-The final version of our dataset has been released. Please remember to download the latest version. * 2023.10.03-We release our code. # Dataset We collect a new comprehensive dataset Ref-ZOM (**Z**ero/**O**ne/**M**any), which contains image-text pairs in one-to-zero, one-to-one and one-to-many conditions. Similar to RefCOCO, RefCOCO+ and G-Ref, all the images in Ref-ZOM are selected from COCO dataset. Here, we provide the text, image and annotation information of Ref-ZOM, which should be utilized with COCO_trainval2014 together.
Our dataset could be downloaded from:
[[Baidu Cloud](https://pan.baidu.com/s/1CxPYGWEadHhcViTH2iI7jw?pwd=g7uu)] [[Google Drive](https://drive.google.com/drive/folders/1FaH6U5pywSf0Ufnn4lYIVaykYxqU2vrA?usp=sharing)]
Remember to download original COCO dataset from:
[[COCO Dowanload](https://cocodataset.org/#download)]
# Code **Prepare**
* Download the COCO_train2014 and COCO_val2014, and merge the two dataset as a new folder “trainval2014”. Then, in the Line-52 in `/refer/refer.py`, give the path of this folder to `self.Image_DIR`
* Download and rename the "Ref-ZOM(final).p" as "refs(final).p". Then put refs(final).p and instances.json into `/refer/data/ref-zom/*`.
* Prepare the Bert similar to [LAVT](https://github.com/yz93/LAVT-RIS) * Prepare the Refcoco, Refcoco+ and Refcocog similar to [LAVT](https://github.com/yz93/LAVT-RIS) **Train**
* Remember to change `--output_dir` and `--pretrained_backbone` as your path.
* Utilize `--model` to select the backbone. 'dmmi-swin' for Swin-Base and 'dmmi_res' for resnet-50.
* Utilize `--dataset`, `--splitBy` and `--split` to select the dataset as follwos:
``` # Refcoco --dataset refcoco, --splitBy unc, --split val # Refcoco+ --dataset refcoco+, --splitBy unc, --split val # Refcocog(umd) --dataset refcocog, --splitBy umd, --split val # Refcocog(google) --dataset refcocog, --splitBy google, --split val # Ref-zom --dataset ref-zom, --splitBy final, --split test ``` * Begin training!!
``` sh train.sh ``` **Test** * Remember to change `--test_parameter` as your path. Meanwhile, set the `--model`, `--dataset`, `--splitBy` and `--split` properly.
* Begin test!!
``` sh test.sh ``` # Parameter **Refcocog(umd)**
| Backbone | oIoU | mIoU | Google Drive |Baidu Cloud | | ------------- | ------------- | ------------- | ------------- | ------------- | | ResNet-101 | 59.02 | 62.59 | [Link](https://drive.google.com/file/d/1ziDIeioglD08QQyL-_yGFFlao3PtcJJS/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1uKJ-Wu5TtJhphXNOXo3mIA?pwd=6cgb) | | Swin-Base | 63.46 | 66.48 | [Link](https://drive.google.com/file/d/1uuGWSYLGYa_qMxTlnZxH6p9FMxQLOQfZ/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1eAT0NgkID4qXpoXMf2bjEg?pwd=bq7w) | **Ref-ZOM**
| Backbone | oIoU | mIoU | Google Drive |Baidu Cloud | | ------------- | ------------- | ------------- | ------------- | ------------- | | Swin-Base | 68.77 | 68.25 | [Link](https://drive.google.com/file/d/1Ut_E-Fru0bCmjtaC2YhgOLZ7eJorOOpi/view?usp=drive_link) | [Link](https://pan.baidu.com/s/1T-u55rpbc4_CNEXmsA-OJg?pwd=hc6e) | # Acknowledgements We strongly appreciate the wonderful work of [LAVT](https://github.com/yz93/LAVT-RIS). Our code is partially founded on this code-base. If you think our work is helpful, we suggest you refer to [LAVT](https://github.com/yz93/LAVT-RIS) and cite it as well.
# Citation If you find our work is helpful and want to cite our work, please use the following citation info.
``` @InProceedings{Hu_2023_ICCV, author = {Hu, Yutao and Wang, Qixiong and Shao, Wenqi and Xie, Enze and Li, Zhenguo and Han, Jungong and Luo, Ping}, title = {Beyond One-to-One: Rethinking the Referring Image Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {4067-4077} }