ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting

Authors: Zongsheng Yue, Jianyi Wang, Chen Change Loy

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps.
Researcher Affiliation Academia Zongsheng Yue Jianyi Wang Chen Change Loy S-Lab, Nanyang Technological University {zongsheng.yue,jianyi001,ccloy}@ntu.edu.sg
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and model are available at https://github.com/zsy OAOA/Res Shift.
Open Datasets Yes HR images with a resolution of 256 256 in our training data are randomly cropped from the training set of Image Net [50] following LDM [11].
Dataset Splits No HR images with a resolution of 256 256 in our training data are randomly cropped from the training set of Image Net [50] following LDM [11]. We synthesize a testing dataset that contains 3000 images randomly selected from the validation set of Image Net [50] based on the commonly-used degradation model, i.e., y = (x k) +n, where k is the blurring kernel, n is the noise, y and x denote the LR image and HR image, respectively. ... We name this dataset as Image Net-Test for convenience.
Hardware Specification Yes Running time is tested on NVIDIA Tesla V100 GPU on the x4 (64 256) SR task.
Software Dependencies No The Adam [51] algorithm with the default settings of Py Torch [52] and a mini-batch size of 64 is used to train Res Shift. (No specific PyTorch version is mentioned)
Experiment Setup Yes HR images with a resolution of 256 256 in our training data are randomly cropped from the training set of Image Net [50] following LDM [11]. We synthesize the LR images using the degradation pipeline of Real ESRGAN [19]. The Adam [51] algorithm with the default settings of Py Torch [52] and a mini-batch size of 64 is used to train Res Shift. During training, we use a fixed learning rate of 5e-5 and update the weight parameters for 500K iterations. As for the network architecture, we employ the UNet structure in DDPM [2]. To increase the robustness of Res Shift to arbitrary image resolution, we replace the self-attention layer in UNet with the Swin Transformer [53] block.