timm
/

Image Classification
timm
PyTorch
Safetensors
Transformers
rwightman HF staff commited on
Commit
396027b
·
1 Parent(s): adb811b
Files changed (4) hide show
  1. README.md +116 -0
  2. config.json +33 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_tag: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - imagenet-21k
10
+ ---
11
+ # Model card for mixer_b16_224.miil_in21k_ft_in1k
12
+
13
+ A MLP-Mixer image classification model. Pretrained on ImageNet-21k and fine-tuned on ImageNet-1k by [Alibaba-MIIL](https://github.com/Alibaba-MIIL).
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 59.9
19
+ - GMACs: 12.6
20
+ - Activations (M): 14.5
21
+ - Image size: 224 x 224
22
+ - **Papers:**
23
+ - MLP-Mixer: An all-MLP Architecture for Vision: https://arxiv.org/abs/2105.01601
24
+ - ImageNet-21K Pretraining for the Masses: https://arxiv.org/abs/2104.10972
25
+ - **Original:** https://github.com/Alibaba-MIIL/ImageNet21K
26
+ - **Dataset:** ImageNet-1k
27
+ - **Pretrain Dataset:** ImageNet-21k
28
+
29
+ ## Model Usage
30
+ ### Image Classification
31
+ ```python
32
+ from urllib.request import urlopen
33
+ from PIL import Image
34
+ import timm
35
+
36
+ img = Image.open(urlopen(
37
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
38
+ ))
39
+
40
+ model = timm.create_model('mixer_b16_224.miil_in21k_ft_in1k', pretrained=True)
41
+ model = model.eval()
42
+
43
+ # get model specific transforms (normalization, resize)
44
+ data_config = timm.data.resolve_model_data_config(model)
45
+ transforms = timm.data.create_transform(**data_config, is_training=False)
46
+
47
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
48
+
49
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
50
+ ```
51
+
52
+ ### Image Embeddings
53
+ ```python
54
+ from urllib.request import urlopen
55
+ from PIL import Image
56
+ import timm
57
+
58
+ img = Image.open(urlopen(
59
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
60
+ ))
61
+
62
+ model = timm.create_model(
63
+ 'mixer_b16_224.miil_in21k_ft_in1k',
64
+ pretrained=True,
65
+ num_classes=0, # remove classifier nn.Linear
66
+ )
67
+ model = model.eval()
68
+
69
+ # get model specific transforms (normalization, resize)
70
+ data_config = timm.data.resolve_model_data_config(model)
71
+ transforms = timm.data.create_transform(**data_config, is_training=False)
72
+
73
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
74
+
75
+ # or equivalently (without needing to set num_classes=0)
76
+
77
+ output = model.forward_features(transforms(img).unsqueeze(0))
78
+ # output is unpooled, a (1, 196, 768) shaped tensor
79
+
80
+ output = model.forward_head(output, pre_logits=True)
81
+ # output is a (1, num_features) shaped tensor
82
+ ```
83
+
84
+ ## Model Comparison
85
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
86
+
87
+ ## Citation
88
+ ```bibtex
89
+ @article{tolstikhin2021mixer,
90
+ title={MLP-Mixer: An all-MLP Architecture for Vision},
91
+ author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Steiner, Andreas and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
92
+ journal={arXiv preprint arXiv:2105.01601},
93
+ year={2021}
94
+ }
95
+ ```
96
+ ```bibtex
97
+ @misc{ridnik2021imagenet21k,
98
+ title={ImageNet-21K Pretraining for the Masses},
99
+ author={Tal Ridnik and Emanuel Ben-Baruch and Asaf Noy and Lihi Zelnik-Manor},
100
+ year={2021},
101
+ eprint={2104.10972},
102
+ archivePrefix={arXiv},
103
+ primaryClass={cs.CV}
104
+ }
105
+ ```
106
+ ```bibtex
107
+ @misc{rw2019timm,
108
+ author = {Ross Wightman},
109
+ title = {PyTorch Image Models},
110
+ year = {2019},
111
+ publisher = {GitHub},
112
+ journal = {GitHub repository},
113
+ doi = {10.5281/zenodo.4414861},
114
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
115
+ }
116
+ ```
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "mixer_b16_224",
3
+ "num_classes": 1000,
4
+ "num_features": 768,
5
+ "global_pool": "avg",
6
+ "pretrained_cfg": {
7
+ "tag": "miil_in21k_ft_in1k",
8
+ "custom_load": false,
9
+ "input_size": [
10
+ 3,
11
+ 224,
12
+ 224
13
+ ],
14
+ "fixed_input_size": true,
15
+ "interpolation": "bilinear",
16
+ "crop_pct": 0.875,
17
+ "crop_mode": "center",
18
+ "mean": [
19
+ 0.0,
20
+ 0.0,
21
+ 0.0
22
+ ],
23
+ "std": [
24
+ 1.0,
25
+ 1.0,
26
+ 1.0
27
+ ],
28
+ "num_classes": 1000,
29
+ "pool_size": null,
30
+ "first_conv": "stem.proj",
31
+ "classifier": "head"
32
+ }
33
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0f9cd5dd3261082aa2178503fc4e2f12d97c07d12a5cfbdc9041cfe4eea9a34
3
+ size 239536434
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2516f9c59f04a00ec2cea36af8c2fc856195821577d4f1aea74d90c0f4910db0
3
+ size 239577701