Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Universal Rainy Image Restoration: Benchmark and Baseline

Authors: Hujie Yan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To benchmark this task, we construct a high-quality dataset called URIR-8K, which contains four patterns: rain streak, raindrop, rain accumulation and nighttime rain. Building upon this dataset, we present a comprehensive study on existing approaches by evaluating their universal deraining capabilities and their effect on downstream object detection task. In addition, we design a multi-scale vision Mamba as a baseline model, leveraging the benefits of multi-scale learning for its robustness to diverse rain appearances. Extensive experimental analysis shows the potential of our proposed task and the effectiveness of our model. We conduct benchmark experiments on the proposed URIR-8K dataset. We compare it with both classical and recent methods, including four CNN-based approaches: LPNet (Fu et al. 2019), JORDER-E (Yang et al. 2019), RCDNet (Wang et al. 2020a), and SPDNet (Yi et al. 2021); four Transformer-based networks: IDT (Xiao et al. 2022), Restormer (Zamir et al. 2022), DRSformer (Chen et al. 2023a), and Promp IR (Potlapalli et al. 2023); and one Mamba-based architecture, Mamba IR (Guo et al. 2024). To assess the quality of the restored images, we employ two metrics: PSNR (Huynh-Thu and Ghanbari 2008) and SSIM (Wang et al. 2004). To evaluate the performance in object detection tasks, we employ the m AP50 metric (Redmon et al. 2016) to measure accuracy in traffic scenes. Ablation Studies
Researcher Affiliation	Academia	California Institute of Technology 1200 E California Blvd, Pasadena California, 91125 USA EMAIL
Pseudocode	No	The paper includes figures illustrating the overall architecture and mathematical equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps are present in the main text.
Open Source Code	No	Note that we provide a unified pipeline for synthesizing datasets of four common rain patterns, including rain streak, raindrop, rain accumulation, and nighttime rain. We would open-source this pipeline s code, offering researchers a convenient reference for URIR data generation in other scenarios. This is a statement of future intent ('We would open-source') rather than an immediate release or a direct link to a repository.
Open Datasets	Yes	To facilitate the real-world application of universal rainy image restoration, we collect numerous rain-free backgrounds from the BDD100K dataset (Seita 2018). Our ground-truth data includes a diverse range of typical daytime and nighttime first-person driving scenes in urban environments. To evaluate the performance of existing approaches for universal rainy image restoration, we first create a high-quality benchmark dataset named URIR-8K.
Dataset Splits	Yes	In total, our URIR-8K dataset contains 7,200 training pairs and 800 test images, which is divided into four subsets: rain streaks (RS), raindrops (RD), rain accumulation (RA), and nighttime rain (NR). These scenes and data for the training and test sets are completely separate, ensuring no overlap.
Hardware Specification	Yes	All experiments are conducted with a batch size of 1 and a patch size of 256, utilizing one NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	The models are trained using the Py Torch framework with the Adam optimizer. While PyTorch and Adam optimizer are mentioned, no specific version numbers are provided for these software components.
Experiment Setup	Yes	In our network, the stack size of MSSMs is set to 6. The models are trained using the Py Torch framework with the Adam optimizer. The initial learning rate is 1 10 4, which is gradually decreased to 1 10 6 following a cosine annealing strategy (Loshchilov and Hutter 2016). An exception is made for the training on Rain200H, where the initial learning rate is set to 2 10 4. The models are trained on the URIR-8K, Rain200L, Rain200H, and UAV-Rain1K datasets for 300 epochs, and on SPA-Data for 5 epochs. All experiments are conducted with a batch size of 1 and a patch size of 256.