Improve model card: Add pipeline tag, update license, and expand content (#4)

69831f8 verified 27 days ago

6.54 kB

	---
	base_model: microsoft/conditional-detr-resnet-50
	datasets:
	- Voxel51/fisheye8k
	library_name: transformers
	license: mit
	tags:
	- generated_from_trainer
	pipeline_tag: object-detection
	model-index:
	- name: fisheye8k_microsoft_conditional-detr-resnet-50
	results: []
	---

	# fisheye8k_microsoft_conditional-detr-resnet-50

	This model is a fine-tuned version of [microsoft/conditional-detr-resnet-50](https://huggingface.co/microsoft/conditional-detr-resnet-50) on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. It is a key artifact of the Mcity Data Engine project.

	* Paper: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
	* Project Page: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
	* Code: [GitHub Repository for Mcity Data Engine](https://github.com/mcity/mcity_data_engine)

	It achieves the following results on the evaluation set:
	- Loss: 1.4466

	## Model description

	This model is an object detection model fine-tuned on the `microsoft/conditional-detr-resnet-50` architecture using the Fisheye8K dataset. It is developed as part of the Mcity Data Engine, an open-source framework designed to facilitate iterative model improvement through open-vocabulary data selection. The Mcity Data Engine addresses the challenge of selecting and labeling appropriate samples for machine learning models, particularly for detecting long-tail and rare classes of interest in large amounts of unlabeled data within Intelligent Transportation Systems (ITS). This fine-tuned model demonstrates the practical application of the Data Engine's capabilities in enhancing roadside perception systems for autonomous driving and smart city applications.

	The model is trained to detect specific categories, as defined in its configuration: `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck`.

	## Intended uses & limitations

	This model is intended for object detection tasks within Intelligent Transportation Systems, specifically for identifying vehicles and vulnerable road users in visual data, such as that collected from fisheye cameras. Its primary use case is within the Mcity Data Engine framework for research and development related to improving perception models with rare and novel data.

	Limitations:
	* The model's performance may vary on data significantly different from the Fisheye8K dataset (e.g., different camera types, environments, or lighting conditions).
	* Like all deep learning models, it may exhibit biases present in the training data and may not generalize perfectly to all real-world scenarios.
	* Further evaluation on diverse real-world ITS data is recommended for specific deployment scenarios.

	## Usage

	You can use this model directly with the Hugging Face `transformers` library for object detection tasks.

	```python
	from transformers import AutoImageProcessor, AutoModelForObjectDetection
	import torch
	from PIL import Image
	import requests

	# Load image processor and model
	model_name = "mcity-data-engine/fisheye8k_microsoft_conditional-detr-resnet-50"
	processor = AutoImageProcessor.from_pretrained(model_name)
	model = AutoModelForObjectDetection.from_pretrained(model_name)

	# Example image (replace with your own image URL or local path)
	url = "http://images.cocodataset.org/val2017/000000039769.jpg" # A standard COCO image
	image = Image.open(requests.get(url, stream=True).raw).convert("RGB") # Ensure RGB format

	# Perform inference
	inputs = processor(images=image, return_tensors="pt")
	outputs = model(**inputs)

	# Post-process and print results
	target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
	results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] # Apply a confidence threshold

	print("Detected objects:")
	for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
	box = [round(i, 2) for i in box.tolist()]
	print(
	f" - {model.config.id2label[label.item()]} with confidence "
	f"{round(score.item(), 3)} at location {box}"
	)
	```

	## Training and evaluation data

	This model was fine-tuned on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. The Fisheye8K dataset contains images captured from fisheye cameras, primarily focusing on intelligent transportation system scenarios.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 0
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 36
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.0211 \| 1.0 \| 5288 \| 1.5012 \|
	\| 0.9117 \| 2.0 \| 10576 \| 1.4713 \|
	\| 0.8595 \| 3.0 \| 15864 \| 1.4364 \|
	\| 0.7922 \| 4.0 \| 21152 \| 1.5227 \|
	\| 0.7764 \| 5.0 \| 26440 \| 1.6631 \|
	\| 0.7419 \| 6.0 \| 31728 \| 1.4320 \|
	\| 0.7132 \| 7.0 \| 37016 \| 1.4661 \|
	\| 0.6991 \| 8.0 \| 42304 \| 1.4318 \|
	\| 0.6585 \| 9.0 \| 47592 \| 1.4069 \|
	\| 0.6527 \| 10.0 \| 52880 \| 1.4213 \|
	\| 0.6191 \| 11.0 \| 58168 \| 1.4144 \|
	\| 0.6248 \| 12.0 \| 63456 \| 1.3887 \|
	\| 0.6085 \| 13.0 \| 68744 \| 1.4053 \|
	\| 0.582 \| 14.0 \| 74032 \| 1.4418 \|
	\| 0.5592 \| 15.0 \| 79320 \| 1.5815 \|
	\| 0.552 \| 16.0 \| 84608 \| 1.4832 \|
	\| 0.5233 \| 17.0 \| 89896 \| 1.4466 \|

	### Framework versions

	- Transformers 4.48.3
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## Acknowledgements

	Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!

	## Citation

	If you use the Mcity Data Engine in your research, feel free to cite the project:

	```bibtex
	@article{bogdoll2025mcitydataengine,
	title={Mcity Data Engine},
	author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
	journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
	year={2025}
	}
	```