krasserm's picture
Remove image
31c0925
---
license: apache-2.0
inference: false
pipeline_tag: image-classification
datasets:
- imagenet-1k
---
# Perceiver IO image classifier
This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent
to the [deepmind/vision-perceiver-fourier](https://huggingface.co/deepmind/vision-perceiver-fourier) model but based on
implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from
the `deepmind/vision-perceiver-fourier` model with a library-specific [conversion utility](#model-conversion). Both
models generate equal output for the same input.
Content of the `deepmind/vision-perceiver-fourier` [model card](https://huggingface.co/deepmind/vision-perceiver-fourier)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.
## Model description
The model is specif in Appendix A of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (2D Fourier features).
## Intended use and limitations
The model can be used for image classification.
## Usage examples
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
the `perceiver-io` library with extension `text`.
```shell
pip install perceiver-io[text]
```
Then the model can be used with PyTorch. Either use the model and image processor directly
```python
import requests
from PIL import Image
from transformers import AutoModelForImageClassification, AutoImageProcessor
from perceiver.model.vision import image_classifier # auto-class registration
repo_id = "krasserm/perceiver-io-img-clf"
# An image of a baseball player from MS-COCO validation set
url = "http://images.cocodataset.org/val2017/000000507223.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = AutoModelForImageClassification.from_pretrained(repo_id)
processor = AutoImageProcessor.from_pretrained(repo_id)
processed = processor(image, return_tensors="pt")
prediction = model(**processed).logits.argmax(dim=-1)
print(f"Predicted class = {model.config.id2label[prediction.item()]}")
```
```
Predicted class = ballplayer, baseball player
```
or use an `image-classification` pipeline:
```python
import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import image_classifier # auto-class registration
repo_id = "krasserm/perceiver-io-img-clf"
# An image of a baseball player from MS-COCO validation set
url = "http://images.cocodataset.org/val2017/000000507223.jpg"
image = Image.open(requests.get(url, stream=True).raw)
classifier = pipeline("image-classification", model=repo_id)
prediction = classifier(image)
print(f"Predicted class = {prediction[0]['label']}")
```
```
Predicted class = ballplayer, baseball player
```
## Model conversion
The `krasserm/perceiver-io-img-clf` model has been created from the source `deepmind/vision-perceiver-fourier` model
with:
```python
from perceiver.model.vision.image_classifier import convert_model
convert_model(
save_dir="krasserm/perceiver-io-img-clf",
source_repo_id="deepmind/vision-perceiver-fourier",
push_to_hub=True,
)
```
## Citation
```bibtex
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
```