File size: 3,203 Bytes
9e9813f
 
 
 
788320a
9e9813f
 
 
 
 
 
70edfbf
9e9813f
 
 
 
 
 
 
70edfbf
 
9e9813f
 
 
70edfbf
9e9813f
 
 
 
 
 
 
 
 
b85dea2
9e9813f
 
b85dea2
9e9813f
 
b85dea2
9e9813f
70edfbf
9e9813f
b85dea2
9e9813f
 
 
 
 
 
 
 
b85dea2
9e9813f
 
b85dea2
9e9813f
 
b85dea2
9e9813f
 
 
b85dea2
9e9813f
 
 
 
 
 
 
 
b85dea2
9e9813f
 
b85dea2
9e9813f
 
b85dea2
9e9813f
 
 
 
 
 
b85dea2
 
 
 
9e9813f
 
 
 
 
 
b85dea2
9e9813f
 
 
 
 
b85dea2
9e9813f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
tags:
- image-classification
- birder
- pytorch
library_name: birder
license: apache-2.0
---

# Model Card for xcit_nano12_p16_il-common

A XCiT image classification model. This model was trained on the `il-common` dataset, which contains common bird species found in Israel.

The species list is derived from data available at <https://www.israbirding.com/checklist/>.

## Model Details

- **Model Type:** Image classification and detection backbone
- **Model Stats:**
    - Params (M): 3.0
    - Input image size: 256 x 256
- **Dataset:** il-common (371 classes)

- **Papers:**
    - XCiT: Cross-Covariance Image Transformers: <https://arxiv.org/abs/2106.09681>

## Model Usage

### Image Classification

```python
import birder
from birder.inference.classification import infer_image

(net, model_info) = birder.load_pretrained_model("xcit_nano12_p16_il-common", inference=True)

# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)

# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)

image = "path/to/image.jpeg"  # or a PIL image, must be loaded in RGB format
(out, _) = infer_image(net, image, transform)
# out is a NumPy array with shape of (1, 371), representing class probabilities.
```

### Image Embeddings

```python
import birder
from birder.inference.classification import infer_image

(net, model_info) = birder.load_pretrained_model("xcit_nano12_p16_il-common", inference=True)

# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)

# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)

image = "path/to/image.jpeg"  # or a PIL image
(out, embedding) = infer_image(net, image, transform, return_embedding=True)
# embedding is a NumPy array with shape of (1, 128)
```

### Detection Feature Map

```python
from PIL import Image
import birder

(net, model_info) = birder.load_pretrained_model("xcit_nano12_p16_il-common", inference=True)

# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)

# Create an inference transform
transform = birder.classification_transform(size, model_info.rgb_stats)

image = Image.open("path/to/image.jpeg")
features = net.detection_features(transform(image).unsqueeze(0))
# features is a dict (stage name -> torch.Tensor)
print([(k, v.size()) for k, v in features.items()])
# Output example:
# [('stage1', torch.Size([1, 128, 16, 16])),
#  ('stage2', torch.Size([1, 128, 16, 16])),
#  ('stage3', torch.Size([1, 128, 16, 16])),
#  ('stage4', torch.Size([1, 128, 16, 16]))]
```

## Citation

```bibtex
@misc{elnouby2021xcitcrosscovarianceimagetransformers,
      title={XCiT: Cross-Covariance Image Transformers},
      author={Alaaeldin El-Nouby and Hugo Touvron and Mathilde Caron and Piotr Bojanowski and Matthijs Douze and Armand Joulin and Ivan Laptev and Natalia Neverova and Gabriel Synnaeve and Jakob Verbeek and Hervé Jegou},
      year={2021},
      eprint={2106.09681},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2106.09681},
}
```