Image Classification
timm
PyTorch
Safetensors
Transformers
pcuenq HF staff rwightman HF staff commited on
Commit
b16c814
·
verified ·
0 Parent(s):

Duplicate from timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k

Browse files

Co-authored-by: Ross Wightman <[email protected]>

Files changed (6) hide show
  1. .gitattributes +34 -0
  2. README.md +168 -0
  3. config.json +33 -0
  4. model.safetensors +3 -0
  5. pytorch_model.bin +3 -0
  6. train_args.yaml +119 -0
.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - laion-2b
10
+ - imagenet-12k
11
+ ---
12
+ # Model card for vit_base_patch32_clip_448.laion2b_ft_in12k_in1k
13
+
14
+ A Vision Transformer (ViT) image classification model. Pretrained on LAION-2B image-text pairs using OpenCLIP. Fine-tuned on ImageNet-12k and then ImageNet-1k in `timm`. See recipes in [Reproducible scaling laws](https://arxiv.org/abs/2212.07143).
15
+
16
+
17
+ ## Model Details
18
+ - **Model Type:** Image classification / feature backbone
19
+ - **Model Stats:**
20
+ - Params (M): 88.3
21
+ - GMACs: 17.2
22
+ - Activations (M): 16.5
23
+ - Image size: 448 x 448
24
+ - **Papers:**
25
+ - OpenCLIP: https://github.com/mlfoundations/open_clip
26
+ - Reproducible scaling laws for contrastive language-image learning: https://arxiv.org/abs/2212.07143
27
+ - LAION-5B: An open large-scale dataset for training next generation image-text models: https://arxiv.org/abs/2210.08402
28
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
29
+ - **Dataset:** ImageNet-1k
30
+ - **Pretrain Dataset:**
31
+ - LAION-2B
32
+ - ImageNet-12k
33
+
34
+ ## Model Usage
35
+ ### Image Classification
36
+ ```python
37
+ from urllib.request import urlopen
38
+ from PIL import Image
39
+ import timm
40
+
41
+ img = Image.open(urlopen(
42
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
43
+ ))
44
+
45
+ model = timm.create_model('vit_base_patch32_clip_448.laion2b_ft_in12k_in1k', pretrained=True)
46
+ model = model.eval()
47
+
48
+ # get model specific transforms (normalization, resize)
49
+ data_config = timm.data.resolve_model_data_config(model)
50
+ transforms = timm.data.create_transform(**data_config, is_training=False)
51
+
52
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
53
+
54
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
55
+ ```
56
+
57
+ ### Image Embeddings
58
+ ```python
59
+ from urllib.request import urlopen
60
+ from PIL import Image
61
+ import timm
62
+
63
+ img = Image.open(urlopen(
64
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
65
+ ))
66
+
67
+ model = timm.create_model(
68
+ 'vit_base_patch32_clip_448.laion2b_ft_in12k_in1k',
69
+ pretrained=True,
70
+ num_classes=0, # remove classifier nn.Linear
71
+ )
72
+ model = model.eval()
73
+
74
+ # get model specific transforms (normalization, resize)
75
+ data_config = timm.data.resolve_model_data_config(model)
76
+ transforms = timm.data.create_transform(**data_config, is_training=False)
77
+
78
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
79
+
80
+ # or equivalently (without needing to set num_classes=0)
81
+
82
+ output = model.forward_features(transforms(img).unsqueeze(0))
83
+ # output is unpooled, a (1, 197, 768) shaped tensor
84
+
85
+ output = model.forward_head(output, pre_logits=True)
86
+ # output is a (1, num_features) shaped tensor
87
+ ```
88
+
89
+ ## Model Comparison
90
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
91
+
92
+ ## Citation
93
+ ```bibtex
94
+ @software{ilharco_gabriel_2021_5143773,
95
+ author = {Ilharco, Gabriel and
96
+ Wortsman, Mitchell and
97
+ Wightman, Ross and
98
+ Gordon, Cade and
99
+ Carlini, Nicholas and
100
+ Taori, Rohan and
101
+ Dave, Achal and
102
+ Shankar, Vaishaal and
103
+ Namkoong, Hongseok and
104
+ Miller, John and
105
+ Hajishirzi, Hannaneh and
106
+ Farhadi, Ali and
107
+ Schmidt, Ludwig},
108
+ title = {OpenCLIP},
109
+ month = jul,
110
+ year = 2021,
111
+ note = {If you use this software, please cite it as below.},
112
+ publisher = {Zenodo},
113
+ version = {0.1},
114
+ doi = {10.5281/zenodo.5143773},
115
+ url = {https://doi.org/10.5281/zenodo.5143773}
116
+ }
117
+ ```
118
+ ```bibtex
119
+ @article{cherti2022reproducible,
120
+ title={Reproducible scaling laws for contrastive language-image learning},
121
+ author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
122
+ journal={arXiv preprint arXiv:2212.07143},
123
+ year={2022}
124
+ }
125
+ ```
126
+ ```bibtex
127
+ @inproceedings{schuhmann2022laionb,
128
+ title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
129
+ author={Christoph Schuhmann and
130
+ Romain Beaumont and
131
+ Richard Vencu and
132
+ Cade W Gordon and
133
+ Ross Wightman and
134
+ Mehdi Cherti and
135
+ Theo Coombes and
136
+ Aarush Katta and
137
+ Clayton Mullis and
138
+ Mitchell Wortsman and
139
+ Patrick Schramowski and
140
+ Srivatsa R Kundurthy and
141
+ Katherine Crowson and
142
+ Ludwig Schmidt and
143
+ Robert Kaczmarczyk and
144
+ Jenia Jitsev},
145
+ booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
146
+ year={2022},
147
+ url={https://openreview.net/forum?id=M3Y74vmsMcY}
148
+ }
149
+ ```
150
+ ```bibtex
151
+ @article{dosovitskiy2020vit,
152
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
153
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
154
+ journal={ICLR},
155
+ year={2021}
156
+ }
157
+ ```
158
+ ```bibtex
159
+ @misc{rw2019timm,
160
+ author = {Ross Wightman},
161
+ title = {PyTorch Image Models},
162
+ year = {2019},
163
+ publisher = {GitHub},
164
+ journal = {GitHub repository},
165
+ doi = {10.5281/zenodo.4414861},
166
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
167
+ }
168
+ ```
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "vit_base_patch32_clip_448",
3
+ "num_classes": 1000,
4
+ "num_features": 768,
5
+ "global_pool": "token",
6
+ "pretrained_cfg": {
7
+ "tag": "laion2b_ft_in12k_in1k",
8
+ "custom_load": false,
9
+ "input_size": [
10
+ 3,
11
+ 448,
12
+ 448
13
+ ],
14
+ "fixed_input_size": true,
15
+ "interpolation": "bicubic",
16
+ "crop_pct": 1.0,
17
+ "crop_mode": "center",
18
+ "mean": [
19
+ 0.48145466,
20
+ 0.4578275,
21
+ 0.40821073
22
+ ],
23
+ "std": [
24
+ 0.26862954,
25
+ 0.26130258,
26
+ 0.27577711
27
+ ],
28
+ "num_classes": 1000,
29
+ "pool_size": null,
30
+ "first_conv": "patch_embed.proj",
31
+ "classifier": "head"
32
+ }
33
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b637111126f3557cd93e1b33af80dd1a7ae713e74cc30aa41abefe6f6b2312b
3
+ size 353365756
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c080dbc3ae19656bbba11616f67fc53fc7b9c1aaf60627bb0591e762c41262d0
3
+ size 353372645
train_args.yaml ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aa: rand-m10-inc1-mstd101
2
+ amp: true
3
+ aot_autograd: false
4
+ apex_amp: false
5
+ aug_repeats: 0
6
+ aug_splits: 0
7
+ batch_size: 512
8
+ bce_loss: false
9
+ bce_target_thresh: null
10
+ bn_eps: null
11
+ bn_momentum: null
12
+ channels_last: false
13
+ checkpoint_hist: 10
14
+ class_map: ''
15
+ clip_grad: 3.0
16
+ clip_mode: norm
17
+ color_jitter: 0.4
18
+ cooldown_epochs: 10
19
+ crop_pct: 1.0
20
+ cutmix: 0.0
21
+ cutmix_minmax: null
22
+ data_dir: /data/imagenet/
23
+ dataset: ''
24
+ dataset_download: false
25
+ decay_epochs: 100
26
+ decay_milestones:
27
+ - 30
28
+ - 60
29
+ decay_rate: 0.1
30
+ dist_bn: reduce
31
+ drop: 0.0
32
+ drop_block: null
33
+ drop_connect: null
34
+ drop_path: 0.1
35
+ epoch_repeats: 0.0
36
+ epochs: 50
37
+ eval_metric: top1
38
+ experiment: ''
39
+ fast_norm: false
40
+ fuser: ''
41
+ gp: null
42
+ grad_checkpointing: true
43
+ hflip: 0.5
44
+ img_size: 448
45
+ in_chans: null
46
+ initial_checkpoint: ''
47
+ input_size: null
48
+ interpolation: ''
49
+ jsd_loss: false
50
+ layer_decay: 0.75
51
+ local_rank: 0
52
+ log_interval: 50
53
+ log_wandb: false
54
+ lr: 0.0001
55
+ lr_cycle_decay: 0.5
56
+ lr_cycle_limit: 1
57
+ lr_cycle_mul: 1.0
58
+ lr_k_decay: 1.0
59
+ lr_noise: null
60
+ lr_noise_pct: 0.67
61
+ lr_noise_std: 1.0
62
+ mean: null
63
+ min_lr: 5.0e-07
64
+ mixup: 0.0
65
+ mixup_mode: batch
66
+ mixup_off_epoch: 0
67
+ mixup_prob: 1.0
68
+ mixup_switch_prob: 0.5
69
+ model: vit_base_patch32_clip.laion2b_ft_in12k
70
+ model_ema: true
71
+ model_ema_decay: 0.9998
72
+ model_ema_force_cpu: false
73
+ momentum: 0.9
74
+ native_amp: false
75
+ no_aug: false
76
+ no_ddp_bb: false
77
+ no_prefetcher: false
78
+ no_resume_opt: false
79
+ num_classes: 1000
80
+ opt: adamw
81
+ opt_betas: null
82
+ opt_eps: null
83
+ output: ''
84
+ patience_epochs: 10
85
+ pin_mem: false
86
+ pretrained: true
87
+ ratio:
88
+ - 0.75
89
+ - 1.3333333333333333
90
+ recount: 1
91
+ recovery_interval: 0
92
+ remode: pixel
93
+ reprob: 0.3
94
+ resplit: false
95
+ resume: ''
96
+ save_images: false
97
+ scale:
98
+ - 0.08
99
+ - 1.0
100
+ sched: cosine
101
+ seed: 42
102
+ smoothing: 0.1
103
+ split_bn: false
104
+ start_epoch: null
105
+ std: null
106
+ sync_bn: false
107
+ torchscript: false
108
+ train_interpolation: random
109
+ train_split: train
110
+ tta: 0
111
+ use_multi_epochs_loader: false
112
+ val_split: validation
113
+ validation_batch_size: null
114
+ vflip: 0.0
115
+ warmup_epochs: 10
116
+ warmup_lr: 1.0e-06
117
+ weight_decay: 0.01
118
+ worker_seeding: all
119
+ workers: 8