Controlling Vision-Language Models for Multi-Task Image Restoration
Authors: Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas B. Schön
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally evaluate our method on two types of tasks: degradation-specific image restoration and unified image restoration. [...] We use the Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018) and Fr echet inception distance (FID) (Heusel et al., 2017) as our main metrics for perceptual evaluation, but also report PSNR and SSIM (Wang et al., 2004) for reference. [...] Our method achieves the best perceptual results across all tasks, and even sets a new state-of-the-art performance for all metrics on the image deraining task. |
| Researcher Affiliation | Academia | Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj olund, Thomas B. Sch on Department of Information Technology, Uppsala University {ziwei.luo,fredrik.gustafsson,zheng.zhao}@it.uu.se {jens.sjolund,thomas.schon}@it.uu.se |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Algolzw/daclip-uir. |
| Open Datasets | Yes | In addition, we construct a large mixed-degradation dataset for ten different image restoration tasks based on BLIP (Li et al., 2022b). [...] We collect a large dataset with ten different image degradation types: blurry, hazy, JPEG-compressing, low-light, noisy, raindrop, rainy, shadowed, snowy, and inpainting. Table 1 summarises the tasks and the number of training and testing images for each degradation type, and more details are provided in Appendix A. [...] Blurry: collected from the Go Pro (Nah et al., 2017) dataset... Hazy: collected from the RESIDE-6k (Qin et al., 2020) dataset... JPEG-compressing: the training dataset has 3440 images collected from DIV2K (Agustsson & Timofte, 2017) and Flickr2K (Timofte et al., 2017). |
| Dataset Splits | No | The paper mentions 'validation' in the context of model evaluation (e.g., 'validation loss') but does not provide explicit details about a separate validation dataset split (e.g., percentages, sample counts, or a clear methodology for creating one) for reproducibility. |
| Hardware Specification | Yes | We train the DA-CLIP model on four NVIDIA A100 GPUs for 50 epochs, in approximately 3 hours. [...] All training is done using one A100 GPU for about 5 days. |
| Software Dependencies | No | The paper mentions the use of Adam W optimizer but does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries. |
| Experiment Setup | Yes | We fine-tune the DA-CLIP on the mixed degradation dataset with a batch size of 3 136 (784 4) and learning rate 3 10 5. In preprocessing, all inputs are normalized in the range [0, 1] and resized to 224 224 with bicubic interpolation. [...] For the restoration model, we use a batch size of 16 and randomly crop images to 256 256 for data augmentation. The initial learning rate is 2 10 4. We use the Adam W (Loshchilov & Hutter, 2017) optimizer (β1 = 0.9, β1 = 0.99) with cosine decay for a total of 700K iterations. |