Lossy Image Compression with Conditional Diffusion Models
Authors: Ruihan Yang, Stephan Mandt
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. |
| Researcher Affiliation | Academia | Ruihan Yang Department of Computer Science University of California Irvine ruihan.yang@uci.edu Stephan Mandt Department of Computer Science University of California Irvine mandt@uci.edu |
| Pseudocode | Yes | Algorithm 1 Training the model (left); Encoding/Decoding data x0 (right). X-prediction model. |
| Open Source Code | Yes | Our code is available at: https://github.com/buggyyang/CDC_compression |
| Open Datasets | Yes | We consider the following datasets with necessary preprocessing: 1. Kodak (Franzen, 2013): ... 2. Tecnick (Asuni & Giachetti, 2014): ... 3. DIV2K (Agustsson & Timofte, 2017): ... 4. COCO2017 (Lin et al., 2014): ... Our model was trained using the well-established Vimeo-90k dataset (Xue et al., 2019)... |
| Dataset Splits | Yes | DIV2K (Agustsson & Timofte, 2017): The validation set of this dataset contains 100 high-quality images. We resize the images with the shorter dimension being equal to 768px. Then, each image is center-cropped to a 768 768 squared shape. COCO2017 (Lin et al., 2014): For this dataset, we extract all test images with resolutions higher than 512 512 and resize them to 384 384 resolution to remove compression artifacts. The resulting dataset consists of 2695 images. Our model was trained using the well-established Vimeo-90k dataset (Xue et al., 2019)... |
| Hardware Specification | Yes | We run benchmarking on a server with a RTX A6000. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer but does not specify software dependencies like programming language versions or library versions (e.g., Python, PyTorch) used for their implementation. |
| Experiment Setup | Yes | The training procedure initiated with a warm-up phase, setting λ to 10 6 and running the model for approximately 700,000 steps. Subsequently, we increased λ to align with the desired bitrates and continued training for an additional 1,000,000 steps until the model reached convergence. For the ϵ-prediction model, our training utilized diffusion process comprising Ntrain = 20, 000 steps. Conversely, the number of diffusion steps for the X-prediction model is Ntrain = 8, 193. We implemented a linear variance schedule to optimize the ϵ-prediction model, while a cosine schedule was selected for the X-prediction model optimization. Throughout the training regime, we maintained a batch size of 4. The Adam optimizer (Kingma & Ba, 2014) was employed to facilitate efficient convergence. We commenced with an initial learning rate of lr = 5 10 5, which was reduced by 20% after every 100,000 steps, ultimately clipped to a learning rate of lr = 2 10 5. |