A Modular Conditional Diffusion Framework for Image Reconstruction
Authors: Magauiya Zhussip, Iaroslav Koshelev, Stamatios Lefkimmiatis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and superresolution. Our method outperforms existing approaches in terms of perceptual quality while it retains a competitive performance with respect to fidelity metrics. (Abstract) Towards this goal, we propose a modular diffusion probabilistic IR framework (DP-IR), which allows us to combine the performance benefits of existing pre-trained stateof-the-art IR networks and generative DPMs, while it requires only the additional training of a relatively small module (0.7M params) related to the particular IR task of interest. Moreover, the architecture of the proposed framework allows for a sampling strategy that leads to at least four times reduction of neural function evaluations without suffering any performance loss, while it can also be combined with existing acceleration techniques such as DDIM. We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and superresolution. Our method outperforms existing approaches in terms of perceptual quality while it retains a competitive performance with respect to fidelity metrics. |
| Researcher Affiliation | Industry | Magauiya Zhussip MTS AI m.zhussip@mts.ai Iaroslav Koshelev AI Foundation and Algorithm Lab ys.koshelev@gmail.com Stamatis Lefkimmiatis MTS AI s.lefkimmiatis@mts.ai |
| Pseudocode | No | The paper describes algorithms and architectures in prose and figures, but does not include any formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | Upon the acceptance of the paper we plan to release the inference code together with the trained models checkpoints. (NeurIPS Paper Checklist, Question 5) |
| Open Datasets | Yes | Our training procedure consists of two stages. We first employ a diverse, yet small DF2K (combination of DIV2K [1] and Flickr2K [45]) dataset to train a Denoising Module... Specifically, we train our Fusion networks we use datasets that are common among our main competitors, specifically the Zurich RAW2RGB [29] dataset for burst JDD-SR, Go Pro [53] for dynamic scene deblurring and DIV2K [1] for 4 SISR. |
| Dataset Splits | Yes | For 4 SISR we use the DIV2K validation dataset [1] consisting of 100 images of 2K resolution. (Section 4) Specifically, we use 3214 pairs of clean and blurry 1280 720 images, out of which we have excluded the 1111 pairs reserved for evaluation purposes. (Appendix H, Dynamic Scene Deblurring) |
| Hardware Specification | Yes | All the networks are trained using the Ascend 910 AI accelerators [44]. |
| Software Dependencies | No | The paper mentions 'Adam [32] optimizer' but does not provide specific version numbers for software libraries or frameworks like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | Specifically, we train it for 300k iterations with a learning rate of 10 4 0.99 it/1000, batch size of 128, and crop size of 256 256. (Section 4) The batch size, training crop size, and initial learning rate are set to 8 (in total), 192 192, and 2 10 4, respectively. (Appendix H, Denoising Module) |