Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Real-Time Scene-Adaptive Tone Mapping for High-Dynamic Range Object Detection
Authors: Gongzhe Li, Linwei Qiu, Peibei Cao, Fengying Xie, Xiangyang Ji, Qilin Sun
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the Ro D dataset [15], which contains 20,089 24-bit HDR RAW images. Unlike RAODNet [15], we test our proposed method on mixed scenes to validate its effectiveness and generality. Implementation Details. We employ two widely used object detectors: Faster R-CNN (Res Net50) [4] and YOLOv3 (Dark Net53) [5]. Our implementation is based on the MMDetection [54] codebase. Following the setup in [15], HDR RAW images are processed with linear demosaicing [55, 56] to restore color channels and resized to 1280 1280 size. During training, we apply random flipping for data augmentation, use a batch size of 8, and train for 14 epochs with an initial learning rate of 1e-2. The learning rate is decayed by a factor of 10 at epochs 8 and 11. And we use Faster R-CNN [4] as the main detector for the following ablation studies. To accelerate adaptation to the HDR RAW domain, we initialize the model with COCO [48] pretrained weights. Evaluation. We evaluate performance using mean Average Precision (m AP) and mean Average Recall (m AR) across all Intersection over Union (Io U) thresholds, along with Average Precision (AP) at Io U thresholds of 0.5 (AP50) and 0.75 (AP75). Additionally, we test model complexity in terms of the parameters (K), computational complexity (FLOPs), and inference latency (ms) on NVIDIA Jetson platforms. |
| Researcher Affiliation | Collaboration | Gongzhe Li1 Linwei Qiu2 Peibei Cao3 Fengying Xie2 Xiangyang Ji4 Qilin Sun1,5 1 School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 2Tianmushan Laboratory, Beihang University, Hangzhou, China 3School of Artificial Intelligence, Nanjing University of Information Science and Technology, China 4Department of Automation, Tsinghua University, Beijing, China 5Point Spread Technology, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and textual explanations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures formatted as such. |
| Open Source Code | No | Our code will be available after submission. |
| Open Datasets | Yes | We evaluate our method on the Ro D dataset [15], which contains 20,089 24-bit HDR RAW images. Unlike RAODNet [15], we test our proposed method on mixed scenes to validate its effectiveness and generality. ... To accelerate adaptation to the HDR RAW domain, we initialize the model with COCO [48] pretrained weights. ... We also evaluate the RAW domain adapter methods [49, 17], with the results shown in Table 2. Our approach optimizes only 33.6% of the total parameters of Faster R-CNN (tone mapper: 0.10% + detection head: 33.6%), yet achieves state-of-the-art performance compared to other methods. Ablations on Scene Generalization. To evaluate cross-scene generalization, we divided the Ro D dataset [15] into two subsets based on scene lighting. ... Ablations on Advanced Detector. We also evaluate advanced detectors (Sparse-RCNN [7] and Deformable DETR [8]) in our method. The experimental results in Table 6 show that our method significantly improves detection performance on HDR RAW images, outperforming HDR ISP by 6.3% and 4.8%, respectively. These results demonstrate that our method exhibits improved generalization across various downstream detectors. Analysis of Scaling-Invariant Tone Mapper. To evaluate the generalization ability of our proposed scaling-invariant tone mapper, we train the model on the Ro D dataset [15] and test it on the Rho Vision dataset [33], which contains HDR RAW images captured by the same sensor in different scenes. |
| Dataset Splits | No | We evaluate our method on the Ro D dataset [15], which contains 20,089 24-bit HDR RAW images. Unlike RAODNet [15], we test our proposed method on mixed scenes to validate its effectiveness and generality. ... To evaluate cross-scene generalization, we divided the Ro D dataset [15] into two subsets based on scene lighting. ... To accelerate adaptation to the HDR RAW domain, we initialize the model with COCO [48] pretrained weights. ... We train the model on the Ro D dataset [15] and test it on the Rho Vision dataset [33]. |
| Hardware Specification | Yes | The proposed method outperforms traditional tone mapping algorithms and advanced AI-ISP methods in challenging automotive HDR scenes. Moreover, our pipeline achieves real-time processing of 4K high-bit-depth HDR inputs on NVIDIA Jetson platforms. ... Figure 1: Comparison of detection performance and model complexity on the Ro D dataset using Faster R-CNN (1280 1280 resolution). The symbol indicates the performance of our Ours (Lite) model with 4K resolution input, achieving 45 FPS on NVIDIA Jetson platforms (16-bit float precision). ... We evaluate the latency and m AP performance of YOLOv3 [5] on the NVIDIA Jetson AGX Orin (16-bit float precision), with the results shown in Fig. 5 (left). |
| Software Dependencies | No | Our implementation is based on the MMDetection [54] codebase. |
| Experiment Setup | Yes | Implementation Details. We employ two widely used object detectors: Faster R-CNN (Res Net50) [4] and YOLOv3 (Dark Net53) [5]. Our implementation is based on the MMDetection [54] codebase. Following the setup in [15], HDR RAW images are processed with linear demosaicing [55, 56] to restore color channels and resized to 1280 1280 size. During training, we apply random flipping for data augmentation, use a batch size of 8, and train for 14 epochs with an initial learning rate of 1e-2. The learning rate is decayed by a factor of 10 at epochs 8 and 11. And we use Faster R-CNN [4] as the main detector for the following ablation studies. To accelerate adaptation to the HDR RAW domain, we initialize the model with COCO [48] pretrained weights. |