hassonofer commited on
Commit
eb54138
·
verified ·
1 Parent(s): 062d799

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -3
README.md CHANGED
@@ -1,3 +1,105 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: apache-2.0
8
+ ---
9
+
10
+ # Model Card for swin_transformer_v2_s_intermediate-arabian-peninsula
11
+
12
+ A Swin Transformer v2 image classification model. The model follows a two-stage training process: first undergoing intermediate training on a large-scale dataset containing diverse bird species from around the world, then fine-tuned specifically on the `arabian-peninsula` dataset (all the relevant bird species found in the Arabian peninsula inc. rarities).
13
+
14
+ The species list is derived from data available at <https://avibase.bsc-eoc.org/checklist.jsp?region=ARA>.
15
+
16
+ ## Model Details
17
+
18
+ - **Model Type:** Image classification and detection backbone
19
+ - **Model Stats:**
20
+ - Params (M): 49.5
21
+ - Input image size: 384 x 384
22
+ - **Dataset:** arabian-peninsula (735 classes)
23
+ - Intermediate training involved ~4500 species from asia, europe and eastern africa
24
+
25
+ - **Papers:**
26
+ - Swin Transformer V2: Scaling Up Capacity and Resolution: <https://arxiv.org/abs/2111.09883>
27
+
28
+ ## Model Usage
29
+
30
+ ### Image Classification
31
+
32
+ ```python
33
+ import birder
34
+ from birder.inference.classification import infer_image
35
+
36
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("swin_transformer_v2_s_intermediate-arabian-peninsula", inference=True)
37
+
38
+ # Get the image size the model was trained on
39
+ size = birder.get_size_from_signature(signature)
40
+
41
+ # Create an inference transform
42
+ transform = birder.classification_transform(size, rgb_stats)
43
+
44
+ image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
45
+ (out, _) = infer_image(net, image, transform)
46
+ # out is a NumPy array with shape of (1, num_classes), representing class probabilities.
47
+ ```
48
+
49
+ ### Image Embeddings
50
+
51
+ ```python
52
+ import birder
53
+ from birder.inference.classification import infer_image
54
+
55
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("swin_transformer_v2_s_intermediate-arabian-peninsula", inference=True)
56
+
57
+ # Get the image size the model was trained on
58
+ size = birder.get_size_from_signature(signature)
59
+
60
+ # Create an inference transform
61
+ transform = birder.classification_transform(size, rgb_stats)
62
+
63
+ image = "path/to/image.jpeg" # or a PIL image
64
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
65
+ # embedding is a NumPy array with shape of (1, embedding_size)
66
+ ```
67
+
68
+ ### Detection Feature Map
69
+
70
+ ```python
71
+ from PIL import Image
72
+ import birder
73
+
74
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("swin_transformer_v2_s_intermediate-arabian-peninsula", inference=True)
75
+
76
+ # Get the image size the model was trained on
77
+ size = birder.get_size_from_signature(signature)
78
+
79
+ # Create an inference transform
80
+ transform = birder.classification_transform(size, rgb_stats)
81
+
82
+ image = Image.open("path/to/image.jpeg")
83
+ features = net.detection_features(transform(image).unsqueeze(0))
84
+ # features is a dict (stage name -> torch.Tensor)
85
+ print([(k, v.size()) for k, v in features.items()])
86
+ # Output example:
87
+ # [('stage1', torch.Size([1, 96, 96, 96])),
88
+ # ('stage2', torch.Size([1, 192, 48, 48])),
89
+ # ('stage3', torch.Size([1, 384, 24, 24])),
90
+ # ('stage4', torch.Size([1, 768, 12, 12]))]
91
+ ```
92
+
93
+ ## Citation
94
+
95
+ ```bibtex
96
+ @misc{liu2022swintransformerv2scaling,
97
+ title={Swin Transformer V2: Scaling Up Capacity and Resolution},
98
+ author={Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo},
99
+ year={2022},
100
+ eprint={2111.09883},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.CV},
103
+ url={https://arxiv.org/abs/2111.09883},
104
+ }
105
+ ```