Spaces:
Build error
Build error
File size: 9,124 Bytes
a5f4f5c f60c170 b916106 fbac9e9 b916106 4a3ad95 95ded68 4a3ad95 a5f4f5c 4a3ad95 cfb375d 4a3ad95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
[](https://paperswithcode.com/sota/fine-grained-image-classification-on-cub-200?p=metaformer-a-unified-meta-framework-for-fine)
[](https://paperswithcode.com/sota/fine-grained-image-classification-on-nabirds?p=metaformer-a-unified-meta-framework-for-fine)
[](https://paperswithcode.com/sota/image-classification-on-inaturalist?p=metaformer-a-unified-meta-framework-for-fine)
[](https://paperswithcode.com/sota/image-classification-on-inaturalist-2018?p=metaformer-a-unified-meta-framework-for-fine)
# MetaFormer
A repository for the code used to create and train the model defined in βMetaFormer : A Unified Meta Framework for Fine-Grained Recognitionβ [arxiv:2203.02751](http://arxiv.org/abs/2203.02751)

## Model zoo
| name | resolution | 1k model | 21k model | iNat21 model |
| :--------: | :----------: | :--------: | :----------: | :------------: |
| MetaFormer-0 | 224x224 | [metafg_0_1k_224](https://drive.google.com/file/d/1BYbe3mrKioN-Ara6hhJiaiEgJLl_thSH/view?usp=sharing)|[metafg_0_21k_224](https://drive.google.com/file/d/1834jQ9OPHOBZDgv7jD6Qu5mNLsD9aeZv/view?usp=sharing)|-|
| MetaFormer-1 | 224x224 | [metafg_1_1k_224](https://drive.google.com/file/d/1p-nIZgnrDatqmSzzDknTFYw-yEEUD_Rz/view?usp=sharing)|[metafg_1_21k_224](https://drive.google.com/file/d/1AcybDVEY-kXFT0D79w1G7I0h4r1IxLlG/view?usp=sharing)|-|
| MetaFormer-2 | 224x224 | [metafg_2_1k_224](https://drive.google.com/file/d/1K6EEyFKbMUBpPqaEJMvo93YHTXCsgH2V/view?usp=sharing)|[metafg_2_21k_224](https://drive.google.com/file/d/1VygaD_IwYq25KwoupWfttKRZUm2_SPeK/view?usp=sharing)|-|
| MetaFormer-0 | 384x384 | [metafg_0_1k_384](https://drive.google.com/file/d/1r62S3CJFRWV_qA5udC9MOFOJYwRf8mE2/view?usp=sharing) | [metafg_0_21k_384](https://drive.google.com/file/d/1wVmlPjNTA6JKHcF3ROGorEVPxKVO83Ss/view?usp=sharing) | [metafg_0_inat21_384](https://drive.google.com/file/d/11gCk_IuSN7krdkOUSWSM4xlf8GGknmxc/view?usp=sharing) |
| MetaFormer-1 | 384x384 | [metafg_1_1k_384](https://drive.google.com/file/d/12OTmZg4J6fMGvs-colOTDfmhdA5EMMvo/view?usp=sharing) | [metafg_1_21k_384](https://drive.google.com/file/d/13dsarbtsNrkhpG5XpCRlN5ogXDGXO3Z_/view?usp=sharing) | [metafg_1_inat21_384](https://drive.google.com/file/d/1ATUIrDxaQaGqx4lJ8HE2IwX_evMhblPu/view?usp=sharing) |
| MetaFormer-2 | 384x384 | [metafg_2_1k_384](https://drive.google.com/file/d/167oBaseORq32aFA3Ex6lpHuasvu2PMb8/view?usp=sharing) | [metafg_2_21k_384](https://drive.google.com/file/d/1PnpntloQaYduEokFGQ6y79G7DdyjD_u3/view?usp=sharing) | [metafg_2_inat21_384](https://drive.google.com/file/d/17sUNST7ivQhonBAfZEiTOLAgtaHa4F3e/view?usp=sharing) |
## Usage
#### python module
* install `Pytorch and torchvision`
```
pip install torch==1.5.1 torchvision==0.6.1
```
* install `timm`
```
pip install timm==0.4.5
```
* install `Apex`
```
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```
* install other requirements
```
pip install opencv-python==4.5.1.48 yacs==0.1.8
```
#### data preparation
Download [inat21,18,17](https://github.com/visipedia/inat_comp),[CUB](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html),[NABirds](https://dl.allaboutbirds.org/nabirds),[stanfordcars](https://ai.stanford.edu/~jkrause/cars/car_dataset.html), and [aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/), put them in respective folders (\<root\>/datasets/<dataset_name>) and Unzip file. The folder sturture as follow:
```
datasets
|ββββinraturelist2021
| βββββββtrain
| βββββββval
| βββββββtrain.json
| βββββββval.json
|ββββinraturelist2018
| βββββββtrain_val_images
| βββββββtrain2018.json
| βββββββval2018.json
| βββββββtrain2018_locations.json
| βββββββval2018_locations.json
| βββββββcategories.json.json
|ββββinraturelist2017
| βββββββtrain_val_images
| βββββββtrain2017.json
| βββββββval2017.json
| βββββββtrain2017_locations.json
| βββββββval2017_locations.json
|ββββcub-200
| βββββββ...
|ββββnabirds
| βββββββ...
|ββββstanfordcars
| βββββββcar_ims
| βββββββcars_annos.mat
|ββββaircraft
| βββββββ...
```
#### Training
You can dowmload pre-trained model from model zoo, and put them under \<root\>/pretrained.
To train MetaFG on datasets, run:
```
python3 -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --cfg <config-file> --dataset <dataset-name> --pretrain <pretainedmodel-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
```
\<dataset-name\>:inaturelist2021,inaturelist2018,inaturelist2017,cub-200,nabirds,stanfordcars,aircraft
For CUB-200-2011, run:
```
python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg ./configs/MetaFG_1_224.yaml --batch-size 32 --tag cub-200_v1 --lr 5e-5 --min-lr 5e-7 --warmup-lr 5e-8 --epochs 300 --warmup-epochs 20 --dataset cub-200 --pretrain ./pretrained_model/<xxxx>.pth --accumulation-steps 2 --opts DATA.IMG_SIZE 384
```
note that final learning rate is total_bs/512.
#### Eval
To evaluate model on dataset,run:
```
python3 -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval --cfg <config-file> --dataset <dataset-name> --resume <checkpoint> [--batch-size <batch-size-per-gpu>]
```
## Main Result
#### ImageNet-1k
| Name | Resolution | #Param | #FLOPS | Throughput | Top-1 acc |
| :--------: | :----------: | :--------: | :----------: | :------------: | :------------: |
| MetaFormer-0 | 224x224 | 28M | 4.6G | 840.1 | 82.9 |
| MetaFormer-1 | 224x224 | 45M | 8.5G | 444.8 | 83.9 |
| MetaFormer-2 | 224x224 | 81M | 16.9G | 438.9 | 84.1 |
| MetaFormer-0 | 384x384 | 28M | 13.4G | 349.4 | 84.2 |
| MetaFormer-1 | 384x384 | 45M | 24.7G | 165.3 | 84.4 |
| MetaFormer-2 | 384x384 | 81M | 49.7G | 132.7 | 84.6 |
#### Fine-grained Datasets
Result on fine-grained datasets with different pre-trained model.
| Name | Pretrain | CUB | NABirds | iNat2017 | iNat2018 | Cars | Aircraft |
| :--------: | :----------: | :--------: | :----------: | :------------: | :------------: | :--------: |:--------: |
| MetaFormer-0|ImageNet-1k|89.6|89.1|75.7|79.5|95.0|91.2|
| MetaFormer-0|ImageNet-21k|89.7|89.5|75.8|79.9|94.6|91.2|
| MetaFormer-0|iNaturalist 2021|91.8|91.5|78.3|82.9|95.1|87.4|
| MetaFormer-1|ImageNet-1k|89.7|89.4|78.2|81.9|94.9|90.8|
| MetaFormer-1|ImageNet-21k|91.3|91.6|79.4|83.2|95.0|92.6|
| MetaFormer-1|iNaturalist 2021|92.3|92.7|82.0|87.5|95.0|92.5|
| MetaFormer-2|ImageNet-1k|89.7|89.7|79.0|82.6|95.0|92.4|
| MetaFormer-2|ImageNet-21k|91.8|92.2|80.4|84.3|95.1|92.9|
| MetaFormer-2|iNaturalist 2021|92.9|93.0|82.8|87.7|95.4|92.8|
Results in iNaturalist 2019, iNaturalist 2018, and iNaturalist 2021 with meta-information.
| Name | Pretrain | Meta added| iNat2017 | iNat2018 | iNat2021 |
| :--------: | :----------: | :--------: | :---------- | :------------ |:------------ |
|MetaFormer-0|ImageNet-1k|N|75.7|79.5|88.4|
|MetaFormer-0|ImageNet-1k|Y|79.8(+4.1)|85.4(+5.9)|92.6(+4.2)|
|MetaFormer-1|ImageNet-1k|N|78.2|81.9|90.2|
|MetaFormer-1|ImageNet-1k|Y|81.3(+3.1)|86.5(+4.6)|93.4(+3.2)|
|MetaFormer-2|ImageNet-1k|N|79.0|82.6|89.8|
|MetaFormer-2|ImageNet-1k|Y|82.0(+3.0)|86.8(+4.2)|93.2(+3.4)|
|MetaFormer-2|ImageNet-21k|N|80.4|84.3|90.3|
|MetaFormer-2|ImageNet-21k|Y|83.4(+3.0)|88.7(+4.4)|93.6(+3.3)|
## Citation
```
@article{MetaFormer,
title={MetaFormer: A Unified Meta Framework for Fine-Grained Recognition},
author={Diao, Qishuai and Jiang, Yi and Wen, Bin and Sun, Jia and Yuan, Zehuan},
journal={arXiv preprint arXiv:2203.02751},
year={2022},
}
```
## Acknowledgement
Many thanks for [swin-transformer](https://github.com/microsoft/Swin-Transformer).A part of the code is borrowed from it.
|