CosAE: Learnable Fourier Series for Image Restoration
Authors: Sifei Liu, Shalini De Mello, Jan Kautz
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the advantage of Cos AE via extensive experiments on flexible-resolution superresolution and blind image restoration, two highly challenging tasks that demand the restoration network to effectively generalize to complex and even unknown image degradations. Our method surpasses state-of-the-art approaches, highlighting its capability to learn a generalizable representation for image restoration. Our method surpasses state-of-the-art approaches, highlighting its capability to learn a generalizable representation for image restoration. The project page is maintained at https://sifeiliu.net/Cos AE-page/. |
| Researcher Affiliation | Industry | Sifei Liu, Shalini De Mello, Jan Kautz NVIDIA {sifeil, shalinig, jkautz}@nvidia.com |
| Pseudocode | No | The paper does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | Codes will be released upon legal approval. |
| Open Datasets | Yes | We train the face model using cropped faces from the training splits of FFHQ [50] and Celeb A-HQ [51, 52], utilizing the official training and validation splits. To perform FR-SR training on natural images, we initially pretrain Cos AE on the Image Net [54] training split. Subsequently, we fine-tuned the model using the same training settings on a combination of the DIV-2K [24] training set and all the images from Flicker2K [55] (referred to as the DF2K dataset). To evaluate the approach, we further introduce a synthetic dataset produced on the COCO validation split [66], namely COCO-Test, using the same corruption operators at training. |
| Dataset Splits | Yes | We train the face model using cropped faces from the training splits of FFHQ [50] and Celeb A-HQ [51, 52], utilizing the official training and validation splits. To evaluate the output images, we use FID [53] and LPIPS [48] metrics, along with PSNR and SSIM to compare with methods that use pixel regression objectives [10, 32, 33]. We compared the Cos AE with LIIF [10] aligned in the same way as training the face model, denoted as LIIF-32x as the Encoder for natural images has a 32 downsampling stride. Both models were trained with the same datasets, input settings, and objectives. To evaluate the performance, we used LPIPS on the DIV-2K validation set, considering the adoption of the discriminator. |
| Hardware Specification | Yes | As shown in table 4, Cos AE outperforms most models in both efficiency (i.e., number of parameters and runtime) and performance. Qualitative visual comparison with LDM-4 [7] are shown in Figure. 14. We use FID score with the reference of 50k validation images in Image Net, and the LPIPS [48] as the evaluation matrices. In addition, model size as well as the inference speed (ours are tested on a single V100) are reported. |
| Software Dependencies | No | The paper mentions using frameworks or concepts (e.g., building on [6] which likely implies PyTorch or TensorFlow, and mentioning CUDA implicitly for GPU usage) but does not provide specific version numbers for any software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train Cos AE on 256 × 256 patches for all the downstream tasks, while setting c = 256 for all the experiments. Please refer to Sec. 3.2 and 3.3 for other hyper-parameters. We apply the weight λ for the discriminative loss of the patch LGAN as 0.8, while fixing the weight for the regression loss Lrec to 1. We adopt the same configuration for the Adam optimizer as introduced in the VQVAE training pipeline in [6, 7], for all the models. |