danielbogdoll's picture
Improve model card: Add pipeline tag, update license, and expand content (#4)
69831f8 verified
---
base_model: microsoft/conditional-detr-resnet-50
datasets:
- Voxel51/fisheye8k
library_name: transformers
license: mit
tags:
- generated_from_trainer
pipeline_tag: object-detection
model-index:
- name: fisheye8k_microsoft_conditional-detr-resnet-50
results: []
---
# fisheye8k_microsoft_conditional-detr-resnet-50
This model is a fine-tuned version of [microsoft/conditional-detr-resnet-50](https://huggingface.co/microsoft/conditional-detr-resnet-50) on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. It is a key artifact of the **Mcity Data Engine** project.
* **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
* **Project Page**: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
* **Code**: [GitHub Repository for Mcity Data Engine](https://github.com/mcity/mcity_data_engine)
It achieves the following results on the evaluation set:
- Loss: 1.4466
## Model description
This model is an object detection model fine-tuned on the `microsoft/conditional-detr-resnet-50` architecture using the Fisheye8K dataset. It is developed as part of the **Mcity Data Engine**, an open-source framework designed to facilitate iterative model improvement through open-vocabulary data selection. The Mcity Data Engine addresses the challenge of selecting and labeling appropriate samples for machine learning models, particularly for detecting long-tail and rare classes of interest in large amounts of unlabeled data within Intelligent Transportation Systems (ITS). This fine-tuned model demonstrates the practical application of the Data Engine's capabilities in enhancing roadside perception systems for autonomous driving and smart city applications.
The model is trained to detect specific categories, as defined in its configuration: `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck`.
## Intended uses & limitations
This model is intended for object detection tasks within Intelligent Transportation Systems, specifically for identifying vehicles and vulnerable road users in visual data, such as that collected from fisheye cameras. Its primary use case is within the Mcity Data Engine framework for research and development related to improving perception models with rare and novel data.
**Limitations**:
* The model's performance may vary on data significantly different from the Fisheye8K dataset (e.g., different camera types, environments, or lighting conditions).
* Like all deep learning models, it may exhibit biases present in the training data and may not generalize perfectly to all real-world scenarios.
* Further evaluation on diverse real-world ITS data is recommended for specific deployment scenarios.
## Usage
You can use this model directly with the Hugging Face `transformers` library for object detection tasks.
```python
from transformers import AutoImageProcessor, AutoModelForObjectDetection
import torch
from PIL import Image
import requests
# Load image processor and model
model_name = "mcity-data-engine/fisheye8k_microsoft_conditional-detr-resnet-50"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForObjectDetection.from_pretrained(model_name)
# Example image (replace with your own image URL or local path)
url = "http://images.cocodataset.org/val2017/000000039769.jpg" # A standard COCO image
image = Image.open(requests.get(url, stream=True).raw).convert("RGB") # Ensure RGB format
# Perform inference
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
# Post-process and print results
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] # Apply a confidence threshold
print("Detected objects:")
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f" - {model.config.id2label[label.item()]} with confidence "
f"{round(score.item(), 3)} at location {box}"
)
```
## Training and evaluation data
This model was fine-tuned on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. The Fisheye8K dataset contains images captured from fisheye cameras, primarily focusing on intelligent transportation system scenarios.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 1.0211 | 1.0 | 5288 | 1.5012 |
| 0.9117 | 2.0 | 10576 | 1.4713 |
| 0.8595 | 3.0 | 15864 | 1.4364 |
| 0.7922 | 4.0 | 21152 | 1.5227 |
| 0.7764 | 5.0 | 26440 | 1.6631 |
| 0.7419 | 6.0 | 31728 | 1.4320 |
| 0.7132 | 7.0 | 37016 | 1.4661 |
| 0.6991 | 8.0 | 42304 | 1.4318 |
| 0.6585 | 9.0 | 47592 | 1.4069 |
| 0.6527 | 10.0 | 52880 | 1.4213 |
| 0.6191 | 11.0 | 58168 | 1.4144 |
| 0.6248 | 12.0 | 63456 | 1.3887 |
| 0.6085 | 13.0 | 68744 | 1.4053 |
| 0.582 | 14.0 | 74032 | 1.4418 |
| 0.5592 | 15.0 | 79320 | 1.5815 |
| 0.552 | 16.0 | 84608 | 1.4832 |
| 0.5233 | 17.0 | 89896 | 1.4466 |
### Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
## Acknowledgements
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
## Citation
If you use the Mcity Data Engine in your research, feel free to cite the project:
```bibtex
@article{bogdoll2025mcitydataengine,
title={Mcity Data Engine},
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
year={2025}
}
```