Xformer: Hybrid X-Shaped Transformer for Image Denoising
Authors: Jiale Zhang, Yulun Zhang, Jinjin Gu, Jiahua Dong, Linghe Kong, Xiaokang Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Xformer, under the comparable model complexity, achieves state-of-the-art performance on the synthetic and realworld image denoising tasks. We also provide code and models at https: //github.com/gladzhang/Xformer. |
| Researcher Affiliation | Collaboration | Jiale Zhang1, Yulun Zhang1 , Jinjin Gu2,3, Jiahua Dong4, Linghe Kong1 , Xiaokang Yang1 1Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3University of Sydney, 4Shenyang Institute of Automation, Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the model architecture and calculations in narrative text and diagrams (Fig 1, 3) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also provide code and models at https: //github.com/gladzhang/Xformer. |
| Open Datasets | Yes | For Gaussian denoising, following the previous work (Liang et al., 2021), we use DIV2K (Timofte et al., 2017), Flickr2K (Lim et al., 2017), BSD500 (Arbelaez et al., 2010), and WED (Ma et al., 2016) as training data. ... For real image denoising, same with Restormer (Zamir et al., 2022), we use SIDD (Abdelhamed et0 al., 2018) to train our model. |
| Dataset Splits | No | The evaluation is performed on 1,280 patches of the SIDD validation set (Abdelhamed et al., 2018) and 50 pairs of images from the DND (Plotz & Roth, 2017). While it mentions a validation set, it does not provide explicit dataset split percentages or sample counts for train/validation/test across all datasets used. |
| Hardware Specification | Yes | Our Xformer is implemented on Py Torch (Paszke et al., 2017) using 4 Nvidia A100 GPUs. |
| Software Dependencies | No | Our Xformer is implemented on Py Torch (Paszke et al., 2017) using 4 Nvidia A100 GPUs. The paper mentions PyTorch but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | Firstly, we set the layer numbers of both branches the same, which are [2, 4, 4, 6, 4, 4, 2]. The number of CTBs in the refinement stage is set to 4. Secondly, we set the number of heads in corresponding layers to [1, 2, 4, 8, 4, 2, 1]. The head number of CTBs in the refinement stage is set to 1. Meanwhile, the channels number of shallow features generated by the first convolution layer is set to 48. The expansion size of hidden layers in FFN is set to 2.66. Thirdly, the window size in spatial-wise Transformer blocks is set to 16. Note that we also utilize the shiftedwindow strategy (Liu et al., 2021). Besides, we use pixel-unshuffle and pixel-shuffle operations (Shi et al., 2016) for downsampling and upsampling. Lastly, following the recent work (Zamir et al., 2022), we use the progressive training strategy for fair comparisons. ... Using progressive training strategy proposed by Restormer (Zamir et al., 2022), we set the batch size and patch size pairs to [(64,128^2), (40,160^2), (32,192^2), (16,256^2), (8,320^2), (8,384^2)] at training iterations [0k, 92k, 156k, 204k, 240k, 276k]. Adam W (Loshchilov & Hutter, 2019) is used to optimize our model with β1 = 0.9, β2 = 0.999, and weight decay 10^-4. We train our model for total 300k iterations and the initial learning rate is set to 3 x 10^-4 and gradually reduced to 10^-6 through the cosine annealing (Loshchilov & Hutter, 2017). |