AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection
Authors: Yujin Wang, Tianyi Xu, Zhang Fan, Tianfan Xue, Jinwei Gu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Adaptive ISP not only surpasses the prior state-of-the-art methods for object detection but also dynamically manages the trade-off between detection performance and computational cost, especially suitable for scenes with large dynamic range variations. Project website: https://openimaginglab.github.io/Adaptive ISP/. |
| Researcher Affiliation | Academia | 1 Shanghai AI Laboratory 2 Peking University {wangyujin, zhangfan}@pjlab.org.cn, photon@stu.pku.edu.cn 3 The Chinese University of Hong Kong {tfxue@ie, jwgu@cse}.cuhk.edu.hk |
| Pseudocode | No | The paper does not include structured pseudocode or algorithm blocks. It describes mathematical formulations and discusses modules, but not in a pseudocode format. |
| Open Source Code | No | Project website: https://openimaginglab.github.io/Adaptive ISP/. Codes and datasets will be made publicly available upon acceptance. |
| Open Datasets | Yes | LOD. LOD Dataset [9] is a real-world low-light object detection dataset... One Plus. One Plus Dataset [39] is a real-world low-light object detection dataset... Raw COCO. COCO [19] is a large-scale object detection, segmentation, and captioning dataset. |
| Dataset Splits | Yes | LOD. LOD Dataset [9]... There are 1,830 data pairs for training and 400 data pairs for validation... One Plus. One Plus Dataset [39]... There are 50 pairs for training and 91 pairs for validation... Raw COCO... we convert the COCO validate dataset (5,000 images) to a synthetic raw-like dataset as our evaluate dataset... To validate the necessity of Image Signal Processing (ISP), we conducted a series of experiments on a raw COCO dataset [19]... The final experimental results were compared on the raw COCO validation set... Because the released dataset is only a training dataset that provides paired raw images and annotations, we randomly split 80% of the dataset (12,800) for training, with the remainder as our validation dataset (3200). |
| Hardware Specification | Yes | To verify the practicality of our approach, we conducted speed tests using the NVIDIA GTX1660Ti GPU... Our training comprises 100,000 iterations on one NVIDIA RTX 3090 (24G) GPU for the LOD dataset [9], which is completed in around 24 hours. |
| Software Dependencies | No | The paper mentions using YOLOv3 [32] as the detection model and the Adam optimizer, as well as rawpy for demosaicing, but does not specify version numbers for these or other software libraries/dependencies. |
| Experiment Setup | Yes | Similar to [39], YOLOv3 [32] is utilized as the detection model in all methods unless explicitly stated otherwise... During training and inference, we follow [9] to use a fixed input resolution of 512 × 512... We utilize the Adam optimizer with an initial learning rate of 3e-5 and a batch size of 8. The learning rate gradually decreases by a factor of λ = 0.13 iter/itertotal. Note that both the policy network and value network use the same initial learning rate. Our training comprises 100,000 iterations on one NVIDIA RTX 3090 (24G) GPU for the LOD dataset [9], which is completed in around 24 hours. |