# Detecting Twenty-thousand Classes using Image-level Supervision
## Description
**Detic**: A **Det**ector with **i**mage **c**lasses that can use image-level labels to easily train detectors.
> [**Detecting Twenty-thousand Classes using Image-level Supervision**](http://arxiv.org/abs/2201.02605),
> Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
> *ECCV 2022 ([arXiv 2201.02605](http://arxiv.org/abs/2201.02605))*
## Usage
## Installation
Detic requires to install CLIP.
```shell
pip install git+https://github.com/openai/CLIP.git
```
### Demo
#### Inference with existing dataset vocabulary embeddings
First, go to the Detic project folder.
```shell
cd projects/Detic
```
Then, download the pre-computed CLIP embeddings from [dataset metainfo](https://github.com/facebookresearch/Detic/tree/main/datasets/metadata) to the `datasets/metadata` folder.
The CLIP embeddings will be loaded to the zero-shot classifier during inference.
For example, you can download LVIS's class name embeddings with the following command:
```shell
wget -P datasets/metadata https://raw.githubusercontent.com/facebookresearch/Detic/main/datasets/metadata/lvis_v1_clip_a%2Bcname.npy
```
You can run demo like this:
```shell
python demo.py \
${IMAGE_PATH} \
${CONFIG_PATH} \
${MODEL_PATH} \
--show \
--score-thr 0.5 \
--dataset lvis
```

### Inference with custom vocabularies
- Detic can detects any class given class names by using CLIP.
You can detect custom classes with `--class-name` command:
```
python demo.py \
${IMAGE_PATH} \
${CONFIG_PATH} \
${MODEL_PATH} \
--show \
--score-thr 0.3 \
--class-name headphone webcam paper coffe
```

Note that `headphone`, `paper` and `coffe` (typo intended) are not LVIS classes. Despite the misspelled class name, Detic can produce a reasonable detection for `coffe`.
## Results
Here we only provide the Detic Swin-B model for the open vocabulary demo. Multi-dataset training and open-vocabulary testing will be supported in the future.
To find more variants, please visit the [official model zoo](https://github.com/facebookresearch/Detic/blob/main/docs/MODEL_ZOO.md).
| Backbone | Training data | Config | Download |
| :------: | :------------------------: | :-------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Swin-B | ImageNet-21K & LVIS & COCO | [config](./configs/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k.py) | [model](https://download.openmmlab.com/mmdetection/v3.0/detic/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k/detic_centernet2_swin-b_fpn_4x_lvis-coco-in21k_20230120-0d301978.pth) |
## Citation
If you find Detic is useful in your research or applications, please consider giving a star 🌟 to the [official repository](https://github.com/facebookresearch/Detic) and citing Detic by the following BibTeX entry.
```BibTeX
@inproceedings{zhou2022detecting,
title={Detecting Twenty-thousand Classes using Image-level Supervision},
author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
booktitle={ECCV},
year={2022}
}
```
## Checklist
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
- [x] Finish the code
- [x] Basic docstrings & proper citation
- [x] Test-time correctness
- [x] A full README
- [ ] Milestone 2: Indicates a successful model implementation.
- [ ] Training-time correctness
- [ ] Milestone 3: Good to be a part of our core package!
- [ ] Type hints and docstrings
- [ ] Unit tests
- [ ] Code polishing
- [ ] Metafile.yml
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.