reproducibilityindex.ai

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

Authors: Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that OSEDiff achieves comparable or even better Real-ISR results, in terms of both objective metrics and subjective evaluations, than previous diffusion model-based Real-ISR methods that require dozens or hundreds of steps.
Researcher Affiliation	Collaboration	Rongyuan Wu1,2, , Lingchen Sun1,2, , Zhiyuan Ma1, , Lei Zhang1,2, 1The Hong Kong Polytechnic University 2OPPO Research Institute
Pseudocode	Yes	A.4 Algorithm of OSEDiff The pseudo-code of our OSEDiff training algorithm is summarized as Algorithm 1.
Open Source Code	Yes	The source codes are released at https://github.com/cswry/OSEDiff.
Open Datasets	Yes	For simplicity, we adopt See SR s setup [52] and train OSEDiff using the LSDIR [26] dataset and the first 10K face images from FFHQ [19]. The synthetic data includes 3000 images of size 512 × 512, whose GT are randomly cropped from DIV2K-Val [2] and degraded using the Real-ESRGAN pipeline [45].
Dataset Splits	No	The paper mentions training and test datasets but does not explicitly state the use of a separate validation dataset split with specific percentages or sample counts.
Hardware Specification	Yes	The inference time is tested on an A100 GPU with 512 × 512 input image size. The entire training process took approximately 1 day on 4 NVIDIA A100 GPUs with a batch size of 16.
Software Dependencies	No	The paper mentions using the Adam W optimizer [33] and the SD 2.1-base model, but it does not specify version numbers for general software dependencies such as Python, PyTorch, or other libraries used in the implementation.
Experiment Setup	Yes	We train OSEDiff with the Adam W optimizer [33] at a learning rate of 5e-5. The entire training process took approximately 1 day on 4 NVIDIA A100 GPUs with a batch size of 16. The rank of Lo RA in the VAE Encoder, diffusion network, and finetuned regularizer is set to 4. The weighting scalars λ1 and λ2 are set to 2 and 1, respectively. The cfg value is set to 7.5.