Cross Aggregation Transformer for Image Restoration

Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, yongbing zhang, Linghe Kong, Xin Yuan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our CAT outperforms recent state-of-the-art methods on several image restoration applications. The code and models are available at https://github.com/zhengchen1999/CAT.
Researcher Affiliation Academia 1Shanghai Jiao Tong University, 2ETH Zürich, 3Shanghai AI Laboratory, 4The University of Sydney, 5Harbin Institute of Technology (Shenzhen), 6Westlake University
Pseudocode No The paper describes its methods and architecture using text and diagrams but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes The code and models are available at https://github.com/zhengchen1999/CAT.
Open Datasets Yes For image SR, we choose DIV2K [37] and Flickr2K [22] as the training data. For JPEG compression artifact reduction, training set consists of DIV2K [37], Flickr2K [22], BSD500 [2], and WED [26]. For real image denoising, we train CAT on SIDD [1] dataset.
Dataset Splits Yes For real image denoising, we train CAT on SIDD [1] dataset. And we have two testing datasets: SIDD validation set [1] and DND [33].
Hardware Specification Yes We use Py Torch [32] to implement our models with 4 Tesla V100 GPUs.
Software Dependencies No The paper mentions using 'Py Torch [32]' but does not specify its version or other software dependencies with version numbers.
Experiment Setup Yes We set the residual group (RG) number as N1=6 and the cross aggregation Transformer block (CATB) number as N2=6 for each RG. The channel dimension, attention head number, and MLP expansion ratio for each CATB are set as 180, 6, and 4, respectively. For image SR, we train the model with batch size 32, where each input image is randomly cropped to 64x64 size, and the total training iterations are 500K. We adopt Adam optimizer [18] with β1=0.9 and β2=0.99 to minimize the L1 loss... The initial learning rate is set as 2x10^-4 and reduced by half at the milestone [250K,400K,450K,475K].