|
--- |
|
license: apache-2.0 |
|
inference: false |
|
pipeline_tag: image-classification |
|
datasets: |
|
- imagenet-1k |
|
--- |
|
|
|
# Perceiver IO image classifier |
|
|
|
This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent |
|
to the [deepmind/vision-perceiver-fourier](https://huggingface.co/deepmind/vision-perceiver-fourier) model but based on |
|
implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from |
|
the `deepmind/vision-perceiver-fourier` model with a library-specific [conversion utility](#model-conversion). Both |
|
models generate equal output for the same input. |
|
|
|
Content of the `deepmind/vision-perceiver-fourier` [model card](https://huggingface.co/deepmind/vision-perceiver-fourier) |
|
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and |
|
training details. |
|
|
|
## Model description |
|
|
|
The model is specif in Appendix A of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (2D Fourier features). |
|
|
|
## Intended use and limitations |
|
|
|
The model can be used for image classification. |
|
|
|
## Usage examples |
|
|
|
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) |
|
the `perceiver-io` library with extension `text`. |
|
|
|
```shell |
|
pip install perceiver-io[text] |
|
``` |
|
|
|
Then the model can be used with PyTorch. Either use the model and image processor directly |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import AutoModelForImageClassification, AutoImageProcessor |
|
from perceiver.model.vision import image_classifier # auto-class registration |
|
|
|
repo_id = "krasserm/perceiver-io-img-clf" |
|
|
|
# An image of a baseball player from MS-COCO validation set |
|
url = "http://images.cocodataset.org/val2017/000000507223.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
model = AutoModelForImageClassification.from_pretrained(repo_id) |
|
processor = AutoImageProcessor.from_pretrained(repo_id) |
|
|
|
processed = processor(image, return_tensors="pt") |
|
prediction = model(**processed).logits.argmax(dim=-1) |
|
|
|
print(f"Predicted class = {model.config.id2label[prediction.item()]}") |
|
``` |
|
``` |
|
Predicted class = ballplayer, baseball player |
|
``` |
|
|
|
or use an `image-classification` pipeline: |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import pipeline |
|
from perceiver.model.vision import image_classifier # auto-class registration |
|
|
|
repo_id = "krasserm/perceiver-io-img-clf" |
|
|
|
# An image of a baseball player from MS-COCO validation set |
|
url = "http://images.cocodataset.org/val2017/000000507223.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
classifier = pipeline("image-classification", model=repo_id) |
|
prediction = classifier(image) |
|
|
|
print(f"Predicted class = {prediction[0]['label']}") |
|
``` |
|
``` |
|
Predicted class = ballplayer, baseball player |
|
``` |
|
|
|
## Model conversion |
|
|
|
The `krasserm/perceiver-io-img-clf` model has been created from the source `deepmind/vision-perceiver-fourier` model |
|
with: |
|
|
|
```python |
|
from perceiver.model.vision.image_classifier import convert_model |
|
|
|
convert_model( |
|
save_dir="krasserm/perceiver-io-img-clf", |
|
source_repo_id="deepmind/vision-perceiver-fourier", |
|
push_to_hub=True, |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{jaegle2021perceiver, |
|
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, |
|
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, |
|
journal={arXiv preprint arXiv:2107.14795}, |
|
year={2021} |
|
} |
|
``` |