One-Step Effective Diffusion Network for Real-World Image Super-Resolution
Authors: Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that OSEDiff achieves comparable or even better Real-ISR results, in terms of both objective metrics and subjective evaluations, than previous diffusion model-based Real-ISR methods that require dozens or hundreds of steps. |
| Researcher Affiliation | Collaboration | Rongyuan Wu1,2, , Lingchen Sun1,2, , Zhiyuan Ma1, , Lei Zhang1,2, 1The Hong Kong Polytechnic University 2OPPO Research Institute |
| Pseudocode | Yes | A.4 Algorithm of OSEDiff The pseudo-code of our OSEDiff training algorithm is summarized as Algorithm 1. |
| Open Source Code | Yes | The source codes are released at https://github.com/cswry/OSEDiff. |
| Open Datasets | Yes | For simplicity, we adopt See SR s setup [52] and train OSEDiff using the LSDIR [26] dataset and the first 10K face images from FFHQ [19]. The synthetic data includes 3000 images of size 512 × 512, whose GT are randomly cropped from DIV2K-Val [2] and degraded using the Real-ESRGAN pipeline [45]. |
| Dataset Splits | No | The paper mentions training and test datasets but does not explicitly state the use of a separate validation dataset split with specific percentages or sample counts. |
| Hardware Specification | Yes | The inference time is tested on an A100 GPU with 512 × 512 input image size. The entire training process took approximately 1 day on 4 NVIDIA A100 GPUs with a batch size of 16. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer [33] and the SD 2.1-base model, but it does not specify version numbers for general software dependencies such as Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | We train OSEDiff with the Adam W optimizer [33] at a learning rate of 5e-5. The entire training process took approximately 1 day on 4 NVIDIA A100 GPUs with a batch size of 16. The rank of Lo RA in the VAE Encoder, diffusion network, and finetuned regularizer is set to 4. The weighting scalars λ1 and λ2 are set to 2 and 1, respectively. The cfg value is set to 7.5. |