Quantization Robustness to Input Degradations for Object Detection
Abstract
Post-training quantization of YOLO models is evaluated for robustness to real-world degradations, with a focus on the effectiveness of a degradation-aware calibration strategy for Static INT8 quantization.
Post-training quantization (PTQ) is crucial for deploying efficient object detection models, like YOLO, on resource-constrained devices. However, the impact of reduced precision on model robustness to real-world input degradations such as noise, blur, and compression artifacts is a significant concern. This paper presents a comprehensive empirical study evaluating the robustness of YOLO models (nano to extra-large scales) across multiple precision formats: FP32, FP16 (TensorRT), Dynamic UINT8 (ONNX), and Static INT8 (TensorRT). We introduce and evaluate a degradation-aware calibration strategy for Static INT8 PTQ, where the TensorRT calibration process is exposed to a mix of clean and synthetically degraded images. Models were benchmarked on the COCO dataset under seven distinct degradation conditions (including various types and levels of noise, blur, low contrast, and JPEG compression) and a mixed-degradation scenario. Results indicate that while Static INT8 TensorRT engines offer substantial speedups (~1.5-3.3x) with a moderate accuracy drop (~3-7% mAP50-95) on clean data, the proposed degradation-aware calibration did not yield consistent, broad improvements in robustness over standard clean-data calibration across most models and degradations. A notable exception was observed for larger model scales under specific noise conditions, suggesting model capacity may influence the efficacy of this calibration approach. These findings highlight the challenges in enhancing PTQ robustness and provide insights for deploying quantized detectors in uncontrolled environments. All code and evaluation tables are available at https://github.com/AllanK24/QRID.
Community
We benchmark how post-training quantization (PTQ) affects robustness of YOLO detectors under real-world image degradations. INT8 (TensorRT) yields ~1.5–3.3× speedups but costs ~3–7 mAP50-95 on clean COCO; robustness mostly doesn’t improve with a simple “degradation-aware” (50/50 clean+corrupted) calibration, with a few gains on the largest model.
What we did. Evaluated five YOLO scales (n→x) across FP32 / FP16 (TensorRT), Dynamic UINT8 (ONNX), and Static INT8 (TensorRT) on COCO val2017, then re-tested under 7 degradations (Gaussian noise/blur at two severities, low contrast, heavy JPEG, plus a mixed set). Calibration for INT8 used either clean images or a 50/50 clean+degraded mix. Batch size 1; we report mAP50-95, mAP50, and latency.
Key takeaways:
FP16 matches FP32 accuracy and cuts latency markedly (e.g., YOLO-x: 61.3→18.2 ms). Dynamic UINT8 (ONNX) preserves FP32 accuracy but is slower than FP32 TensorRT. Static INT8 is fastest but drops a few mAP points on clean data.
Noise is the real villain. Medium Gaussian noise causes the largest relative mAP drops across models/precisions; blur hurts moderately; low contrast & heavy JPEG have small effects.
arXivDegradation-aware calibration (50/50 mix) generally mirrors clean-calibrated INT8; notable exception: YOLO-x under noise (e.g., Noisy-Medium: 28.1% drop mixed vs 34.7% clean).
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models (2025)
- Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs (2025)
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution (2025)
- Enhancing Generalization in Data-free Quantization via Mixup-class Prompting (2025)
- The Uneven Impact of Post-Training Quantization in Machine Translation (2025)
- Post-Training Quantization of Generative and Discriminative LSTM Text Classifiers: A Study of Calibration, Class Balance, and Robustness (2025)
- LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper