Spaces:
Running
Running
jiang
commited on
Commit
·
3c6babc
1
Parent(s):
650c5f6
update
Browse files
README.md
CHANGED
|
@@ -1,185 +1,13 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
## :notes: Introduction
|
| 15 |
-

|
| 16 |
-
PolyFormer is a unified model for referring image segmentation (polygon vertex sequence) and referring expression comprehension (bounding box corner points). The polygons are converted to segmentation masks in the end.
|
| 17 |
-
|
| 18 |
-
**Contributions:**
|
| 19 |
-
|
| 20 |
-
* State-of-the-art results on referring image segmentation and referring expression comprehension on 6 datasets;
|
| 21 |
-
* A unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem;
|
| 22 |
-
* A regression-based decoder for accurate coordinate prediction, which outputs continuous 2D coordinates directly without quantization error..
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
## Getting Started
|
| 27 |
-
### Installation
|
| 28 |
-
```bash
|
| 29 |
-
conda create -n polyformer python=3.7.4
|
| 30 |
-
conda activate polyformer
|
| 31 |
-
python -m pip install -r requirements.txt
|
| 32 |
-
```
|
| 33 |
-
Note: if you are getting import errors from `fairseq`, try the following:
|
| 34 |
-
```bash
|
| 35 |
-
python -m pip install pip==21.2.4
|
| 36 |
-
pip uninstall fairseq
|
| 37 |
-
pip install -r requirements.txt
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
## Datasets
|
| 41 |
-
### Prepare Pretraining Data
|
| 42 |
-
1. Create the dataset folders
|
| 43 |
-
```bash
|
| 44 |
-
mkdir datasets
|
| 45 |
-
mkdir datasets/images
|
| 46 |
-
mkdir datasets/annotations
|
| 47 |
-
```
|
| 48 |
-
2. Download the *2014 Train images [83K/13GB]* from [COCO](https://cocodataset.org/#download),
|
| 49 |
-
original [Flickr30K images](http://shannon.cs.illinois.edu/DenotationGraph/),
|
| 50 |
-
[ReferItGame images](https://drive.google.com/file/d/1R6Tm7tQTHCil6A_eOhjudK3rgaBxkD2t/view?usp=sharing),
|
| 51 |
-
and [Visual Genome images](http://visualgenome.org/api/v0/api_home.html), and extract them to `datasets/images`.
|
| 52 |
-
3. Download the annotation file for pretraining datasets [instances.json](https://drive.google.com/drive/folders/1O4hzL8_s3aUsnj_JZnM3CwANd7TejcJO)
|
| 53 |
-
provided by [SeqTR](https://github.com/sean-zhuh/SeqTR) and store it in `datasets/annotations`.
|
| 54 |
-
The workspace directory should be organized like this:
|
| 55 |
-
```
|
| 56 |
-
PolyFormer/
|
| 57 |
-
├── datasets/
|
| 58 |
-
│ ├── images
|
| 59 |
-
│ │ ├── flickr30k/*.jpg
|
| 60 |
-
│ │ ├── mscoco/
|
| 61 |
-
│ │ │ └── train2014/*.jpg
|
| 62 |
-
│ │ ├── saiaprtc12/*.jpg
|
| 63 |
-
│ │ └── visual-genome/*.jpg
|
| 64 |
-
│ └── annotations
|
| 65 |
-
│ └── instances.json
|
| 66 |
-
└── ...
|
| 67 |
-
```
|
| 68 |
-
4. Generate the tsv files for pretraining
|
| 69 |
-
```bash
|
| 70 |
-
python data/create_pretraining_data.py
|
| 71 |
-
```
|
| 72 |
-
### Prepare Finetuning Data
|
| 73 |
-
1. Follow the instructions in the `./refer` directory to set up subdirectories
|
| 74 |
-
and download annotations.
|
| 75 |
-
This directory is based on the [refer](https://github.com/lichengunc/refer) API.
|
| 76 |
-
|
| 77 |
-
2. Generate the tsv files for finetuning
|
| 78 |
-
```bash
|
| 79 |
-
python data/create_finetuning_data.py
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
## Pretraining
|
| 86 |
-
1. Create the checkpoints folder
|
| 87 |
-
```bash
|
| 88 |
-
mkdir weights
|
| 89 |
-
```
|
| 90 |
-
2. Download pretrain weights of [Swin-base](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth),
|
| 91 |
-
[Swin-large](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth),
|
| 92 |
-
[BERT-base](https://cdn.huggingface.co/bert-base-uncased-pytorch_model.bin)
|
| 93 |
-
and put the weight files in `./pretrained_weights`.
|
| 94 |
-
These weights are needed for training to initialize the model.
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
3. Run the pretraining scripts for model pretraining on the referring expression comprehension task:
|
| 98 |
-
```bash
|
| 99 |
-
cd run_scripts/pretrain
|
| 100 |
-
bash pretrain_polyformer_b.sh # for pretraining PolyFormer-B model
|
| 101 |
-
bash pretrain_polyformer_l.sh # for pretraining PolyFormer-L model
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
## Finetuning
|
| 105 |
-
Run the finetuning scripts for model pretraining on the referring image segmentation and referring expression comprehension tasks:
|
| 106 |
-
```bash
|
| 107 |
-
cd run_scripts/finetune
|
| 108 |
-
bash train_polyformer_b.sh # for finetuning PolyFormer-B model
|
| 109 |
-
bash train_polyformer_l.sh # for finetuning PolyFormer-L model
|
| 110 |
-
```
|
| 111 |
-
Please make sure to link the pretrain weight paths (Line 20) in the finetuning scripts to the best pretraining checkpoints.
|
| 112 |
-
|
| 113 |
-
## Evaluation
|
| 114 |
-
Run the evaluation scripts for evaluating on the referring image segmentation and referring expression comprehension tasks:
|
| 115 |
-
```bash
|
| 116 |
-
cd run_scripts/evaluation
|
| 117 |
-
|
| 118 |
-
# for evaluating PolyFormer-B model
|
| 119 |
-
bash evaluate_polyformer_b_refcoco.sh
|
| 120 |
-
bash evaluate_polyformer_b_refcoco+.sh
|
| 121 |
-
bash evaluate_polyformer_b_refcocog.sh
|
| 122 |
-
|
| 123 |
-
# for evaluating PolyFormer-L model
|
| 124 |
-
bash evaluate_polyformer_l_refcoco.sh
|
| 125 |
-
bash evaluate_polyformer_l_refcoco+.sh
|
| 126 |
-
bash evaluate_polyformer_l_refcocog.sh
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
## Model Zoo
|
| 130 |
-
Download the model weights to `./weights` if you want to use our trained models for finetuning and evaluation.
|
| 131 |
-
|
| 132 |
-
| | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
|
| 133 |
-
|-------------------------------------------------------------------------------------------------------|------|------|---------|------|-------|------|-----|------|------|
|
| 134 |
-
| Model | oIoU | mIoU | [email protected] | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|
| 135 |
-
| [PolyFormer-B](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96 | 89.73 |76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 |
|
| 136 |
-
| [PolyFormer-L](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94 | 90.38 |78.29| 78.49 | 92.89| 73.25| 74.83 | 87.16|
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
| [test_demo.py](..%2F..%2FDownloads%2Ftest_demo.py) | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
|
| 140 |
-
|--------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|------|------|------|
|
| 141 |
-
| Model | oIoU | mIoU |[email protected]| oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|
| 142 |
-
| [PolyFormer-B ](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36 |
|
| 143 |
-
| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 |
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
| | Refcocog val| || | Refcocog test| |
|
| 147 |
-
|-------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|
|
| 148 |
-
| Model | oIoU | mIoU |[email protected] | oIoU | mIoU |[email protected] |
|
| 149 |
-
| [PolyFormer-B](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.76| 69.36 | 84.46| 69.05| 69.88 | 84.96 |
|
| 150 |
-
| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.20| 71.15 | 85.83 | 70.19| 71.17 | 85.91|
|
| 151 |
-
|
| 152 |
-
* Pretrained weights:
|
| 153 |
-
* [PolyFormer-B](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
|
| 154 |
-
* [PolyFormer-L](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)
|
| 155 |
-
|
| 156 |
-
# Acknowlegement
|
| 157 |
-
This codebase is developed based on [OFA](https://github.com/OFA-Sys/OFA).
|
| 158 |
-
Other related codebases include:
|
| 159 |
-
* [Fairseq](https://github.com/pytorch/fairseq)
|
| 160 |
-
* [refer](https://github.com/lichengunc/refer)
|
| 161 |
-
* [LAVT-RIS](https://github.com/yz93/LAVT-RIS/)
|
| 162 |
-
* [SeqTR](https://github.com/sean-zhuh/SeqTR)
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
# Citation
|
| 167 |
-
Please cite our paper if you find this codebase helpful :)
|
| 168 |
-
|
| 169 |
-
```
|
| 170 |
-
@inproceedings{liu2023polyformer,
|
| 171 |
-
title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
|
| 172 |
-
author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
|
| 173 |
-
booktitle={CVPR},
|
| 174 |
-
year={2023}
|
| 175 |
-
}
|
| 176 |
-
```
|
| 177 |
-
|
| 178 |
-
## Security
|
| 179 |
-
|
| 180 |
-
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
|
| 181 |
-
|
| 182 |
-
## License
|
| 183 |
-
|
| 184 |
-
This project is licensed under the Apache-2.0 License.
|
| 185 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: PolyFormer
|
| 3 |
+
emoji: 🖌️🎨
|
| 4 |
+
colorFrom: pink
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 3.14.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: afl-3.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|