File size: 4,638 Bytes
9b855a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Data Preparation

Create a new directory `data` to store all the datasets.

## Ref-COCO

Download the dataset from the official website [COCO](https://cocodataset.org/#download).   
RefCOCO/+/g use the COCO2014 train split.
Download the annotation files from [github](https://github.com/lichengunc/refer).

Convert the annotation files:

```
python3 tools/data/convert_refexp_to_coco.py
```

Finally, we expect the directory structure to be the following:

```
ReferFormer
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ coco
β”‚   β”‚   β”œβ”€β”€ train2014
β”‚   β”‚   β”œβ”€β”€ refcoco
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcoco_train.json
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcoco_val.json
β”‚   β”‚   β”œβ”€β”€ refcoco+
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcoco+_train.json
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcoco+_val.json
β”‚   β”‚   β”œβ”€β”€ refcocog
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcocog_train.json
β”‚   β”‚   β”‚   β”œβ”€β”€ instances_refcocog_val.json
```


## Ref-Youtube-VOS

Download the dataset from the competition's website [here](https://competitions.codalab.org/competitions/29139#participate-get_data).
Then, extract and organize the file. We expect the directory structure to be the following:

```
ReferFormer
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ ref-youtube-vos
β”‚   β”‚   β”œβ”€β”€ meta_expressions
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”‚   β”œβ”€β”€ JPEGImages
β”‚   β”‚   β”‚   β”œβ”€β”€ Annotations
β”‚   β”‚   β”‚   β”œβ”€β”€ meta.json
β”‚   β”‚   β”œβ”€β”€ valid
β”‚   β”‚   β”‚   β”œβ”€β”€ JPEGImages
```

## Ref-DAVIS17

Downlaod the DAVIS2017 dataset from the [website](https://davischallenge.org/davis2017/code.html). Note that you only need to download the two zip files `DAVIS-2017-Unsupervised-trainval-480p.zip` and `DAVIS-2017_semantics-480p.zip`.
Download the text annotations from the [website](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/video-segmentation/video-object-segmentation-with-language-referring-expressions).
Then, put the zip files in the directory as follows.


```
ReferFormer
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ ref-davis
β”‚   β”‚   β”œβ”€β”€ DAVIS-2017_semantics-480p.zip
β”‚   β”‚   β”œβ”€β”€ DAVIS-2017-Unsupervised-trainval-480p.zip
β”‚   β”‚   β”œβ”€β”€ davis_text_annotations.zip
```

Unzip these zip files.
```
unzip -o davis_text_annotations.zip
unzip -o DAVIS-2017_semantics-480p.zip
unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip
```

Preprocess the dataset to Ref-Youtube-VOS format. (Make sure you are in the main directory)

```
python tools/data/convert_davis_to_ytvos.py
```

Finally, unzip the file `DAVIS-2017-Unsupervised-trainval-480p.zip` again (since we use `mv` in preprocess for efficiency).

```
unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip
```




## A2D-Sentences

Follow the instructions and download the dataset from the website [here](https://kgavrilyuk.github.io/publication/actor_action/). 
Then, extract the files. Additionally, we use the same json annotation files generated by [MTTR](https://github.com/mttr2021/MTTR). Please download these files from [onedrive](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/wjn922_connect_hku_hk/EnvcpWsMsY5NrMF5If3F6DwBseMrqmzQwpTtL8HXoLAChw?e=Vlv1et).
We expect the directory structure to be the following:

```
ReferFormer
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ a2d_sentences
β”‚   β”‚   β”œβ”€β”€ Release
β”‚   β”‚   β”œβ”€β”€ text_annotations
β”‚   β”‚   β”‚   β”œβ”€β”€ a2d_annotation_with_instances
β”‚   β”‚   β”‚   β”œβ”€β”€ a2d_annotation.txt
β”‚   β”‚   β”‚   β”œβ”€β”€ a2d_missed_videos.txt
β”‚   β”‚   β”œβ”€β”€ a2d_sentences_single_frame_test_annotations.json
β”‚   β”‚   β”œβ”€β”€ a2d_sentences_single_frame_train_annotations.json
β”‚   β”‚   β”œβ”€β”€ a2d_sentences_test_annotations_in_coco_format.json
```

## JHMDB-Sentences

Follow the instructions and download the dataset from the website [here](https://kgavrilyuk.github.io/publication/actor_action/). 
Then, extract the files. Additionally, we use the same json annotation files generated by [MTTR](https://github.com/mttr2021/MTTR). Please download these files from [onedrive](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/wjn922_connect_hku_hk/EjPyzXq93s5Jm4GU07JrWIMBb6nObY8fEmLyuiGg-0uBtg?e=GsZ6jP).
We expect the directory structure to be the following:

```
ReferFormer
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ jhmdb_sentences
β”‚   β”‚   β”œβ”€β”€ Rename_Images
β”‚   β”‚   β”œβ”€β”€ puppet_mask
β”‚   β”‚   β”œβ”€β”€ jhmdb_annotation.txt
β”‚   β”‚   β”œβ”€β”€ jhmdb_sentences_samples_metadata.json
β”‚   β”‚   β”œβ”€β”€ jhmdb_sentences_gt_annotations_in_coco_format.json
```