Sharing Key Semantics in Transformer Makes Efficient Image Restoration

Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc V Gool, Ming-Hsuan Yang, Nicu Sebe

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across 6 IR tasks confirm the proposed Seman IR s state-of-the-art performance, quantitatively and qualitatively showcasing advancements.
Researcher Affiliation Collaboration Bin Ren1,2,3 Yawei Li4 Jingyun Liang4 Rakesh Ranjan5 Mengyuan Liu6 Rita Cucchiara7 Luc Van Gool3 Ming-Hsuan Yang8 Nicu Sebe2 1University of Pisa 2University of Trento 3INSAIT, Sofia University 4ETH Zürich 5Meta Reality Labs 6State Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School 7University of Modena and Reggio Emilia 8University of California, Merced
Pseudocode Yes Algorithm 1 Key-Semantic Transformer Stage (i.e., Seman IR Stage)
Open Source Code Yes The visual results, code, and trained models are available at https://github.com/Amazingren/Seman IR.
Open Datasets Yes The training datasets: DIV2K [1], Flickr2K [51], and WED [57]. The test datasets: Classic5 [22], LIVE1 [75], Urban100 [30], BSD500 [2].
Dataset Splits No The paper lists training and testing datasets but does not explicitly provide details about validation dataset splits (e.g., percentages, counts, or specific methods for creating validation sets) for its experiments.
Hardware Specification Yes Each experiments are conducted on 4 NVIDIA Tesla V100 32G GPUs.
Software Dependencies No The paper mentions optimizers (Adam, AdamW) and loss functions (smooth L1, VGG, Charbonnier, L1) and hints at PyTorch (torch.gather(), torch-mask) and Triton, but does not specify version numbers for these software dependencies (e.g., PyTorch 1.x, CUDA 11.x).
Experiment Setup Yes Batch Size and Patch Size. We keep the similar batch size as other comparison methods, i.e., (Batch size = 16, Patch Size = 64) for JPEG CAR, denoising, demosaicking, and SR. (Batch Size = 32, Patch Size = 16) for IR in AWC. (Batch Size = 8, Patch Size = 192) for deblurring. Learning Rate Schedule. For all the IR tasks, similar to other comparison methods, we set the initial learning rate to 2 10 4, and then the half-decay is adopted during the training. Note that the training iteration for JPEG CAR, denoising, demosaicking, and SR is set to 1M. For IR in AWC and debluriing, it is set to 750K.