danielbogdoll nielsr HF Staff commited on
Commit
69831f8
·
verified ·
1 Parent(s): bb4b3ee

Improve model card: Add pipeline tag, update license, and expand content (#4)

Browse files

- Improve model card: Add pipeline tag, update license, and expand content (7136a3f75e8b1d4d6fde197592ff385d9c3fc623)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +73 -13
README.md CHANGED
@@ -1,36 +1,82 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
  base_model: microsoft/conditional-detr-resnet-50
5
- tags:
6
- - generated_from_trainer
7
  datasets:
8
  - Voxel51/fisheye8k
 
 
 
 
 
9
  model-index:
10
  - name: fisheye8k_microsoft_conditional-detr-resnet-50
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # fisheye8k_microsoft_conditional-detr-resnet-50
18
 
19
- This model is a fine-tuned version of [microsoft/conditional-detr-resnet-50](https://huggingface.co/microsoft/conditional-detr-resnet-50) on the generator dataset.
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 1.4466
22
 
23
  ## Model description
24
 
25
- More information needed
 
 
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
@@ -68,7 +114,6 @@ The following hyperparameters were used during training:
68
  | 0.552 | 16.0 | 84608 | 1.4832 |
69
  | 0.5233 | 17.0 | 89896 | 1.4466 |
70
 
71
-
72
  ### Framework versions
73
 
74
  - Transformers 4.48.3
@@ -76,4 +121,19 @@ The following hyperparameters were used during training:
76
  - Datasets 3.2.0
77
  - Tokenizers 0.21.0
78
 
79
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: microsoft/conditional-detr-resnet-50
 
 
3
  datasets:
4
  - Voxel51/fisheye8k
5
+ library_name: transformers
6
+ license: mit
7
+ tags:
8
+ - generated_from_trainer
9
+ pipeline_tag: object-detection
10
  model-index:
11
  - name: fisheye8k_microsoft_conditional-detr-resnet-50
12
  results: []
13
  ---
14
 
 
 
 
15
  # fisheye8k_microsoft_conditional-detr-resnet-50
16
 
17
+ This model is a fine-tuned version of [microsoft/conditional-detr-resnet-50](https://huggingface.co/microsoft/conditional-detr-resnet-50) on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. It is a key artifact of the **Mcity Data Engine** project.
18
+
19
+ * **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
20
+ * **Project Page**: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/)
21
+ * **Code**: [GitHub Repository for Mcity Data Engine](https://github.com/mcity/mcity_data_engine)
22
+
23
  It achieves the following results on the evaluation set:
24
  - Loss: 1.4466
25
 
26
  ## Model description
27
 
28
+ This model is an object detection model fine-tuned on the `microsoft/conditional-detr-resnet-50` architecture using the Fisheye8K dataset. It is developed as part of the **Mcity Data Engine**, an open-source framework designed to facilitate iterative model improvement through open-vocabulary data selection. The Mcity Data Engine addresses the challenge of selecting and labeling appropriate samples for machine learning models, particularly for detecting long-tail and rare classes of interest in large amounts of unlabeled data within Intelligent Transportation Systems (ITS). This fine-tuned model demonstrates the practical application of the Data Engine's capabilities in enhancing roadside perception systems for autonomous driving and smart city applications.
29
+
30
+ The model is trained to detect specific categories, as defined in its configuration: `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck`.
31
 
32
  ## Intended uses & limitations
33
 
34
+ This model is intended for object detection tasks within Intelligent Transportation Systems, specifically for identifying vehicles and vulnerable road users in visual data, such as that collected from fisheye cameras. Its primary use case is within the Mcity Data Engine framework for research and development related to improving perception models with rare and novel data.
35
+
36
+ **Limitations**:
37
+ * The model's performance may vary on data significantly different from the Fisheye8K dataset (e.g., different camera types, environments, or lighting conditions).
38
+ * Like all deep learning models, it may exhibit biases present in the training data and may not generalize perfectly to all real-world scenarios.
39
+ * Further evaluation on diverse real-world ITS data is recommended for specific deployment scenarios.
40
+
41
+ ## Usage
42
+
43
+ You can use this model directly with the Hugging Face `transformers` library for object detection tasks.
44
+
45
+ ```python
46
+ from transformers import AutoImageProcessor, AutoModelForObjectDetection
47
+ import torch
48
+ from PIL import Image
49
+ import requests
50
+
51
+ # Load image processor and model
52
+ model_name = "mcity-data-engine/fisheye8k_microsoft_conditional-detr-resnet-50"
53
+ processor = AutoImageProcessor.from_pretrained(model_name)
54
+ model = AutoModelForObjectDetection.from_pretrained(model_name)
55
+
56
+ # Example image (replace with your own image URL or local path)
57
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg" # A standard COCO image
58
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB") # Ensure RGB format
59
+
60
+ # Perform inference
61
+ inputs = processor(images=image, return_tensors="pt")
62
+ outputs = model(**inputs)
63
+
64
+ # Post-process and print results
65
+ target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
66
+ results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] # Apply a confidence threshold
67
+
68
+ print("Detected objects:")
69
+ for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
70
+ box = [round(i, 2) for i in box.tolist()]
71
+ print(
72
+ f" - {model.config.id2label[label.item()]} with confidence "
73
+ f"{round(score.item(), 3)} at location {box}"
74
+ )
75
+ ```
76
 
77
  ## Training and evaluation data
78
 
79
+ This model was fine-tuned on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. The Fisheye8K dataset contains images captured from fisheye cameras, primarily focusing on intelligent transportation system scenarios.
80
 
81
  ## Training procedure
82
 
 
114
  | 0.552 | 16.0 | 84608 | 1.4832 |
115
  | 0.5233 | 17.0 | 89896 | 1.4466 |
116
 
 
117
  ### Framework versions
118
 
119
  - Transformers 4.48.3
 
121
  - Datasets 3.2.0
122
  - Tokenizers 0.21.0
123
 
124
+ ## Acknowledgements
125
+
126
+ Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
127
+
128
+ ## Citation
129
+
130
+ If you use the Mcity Data Engine in your research, feel free to cite the project:
131
+
132
+ ```bibtex
133
+ @article{bogdoll2025mcitydataengine,
134
+ title={Mcity Data Engine},
135
+ author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
136
+ journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
137
+ year={2025}
138
+ }
139
+ ```