Lossy Image Compression with Conditional Diffusion Models

Authors: Ruihan Yang, Stephan Mandt

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
Researcher Affiliation Academia Ruihan Yang Department of Computer Science University of California Irvine ruihan.yang@uci.edu Stephan Mandt Department of Computer Science University of California Irvine mandt@uci.edu
Pseudocode Yes Algorithm 1 Training the model (left); Encoding/Decoding data x0 (right). X-prediction model.
Open Source Code Yes Our code is available at: https://github.com/buggyyang/CDC_compression
Open Datasets Yes We consider the following datasets with necessary preprocessing: 1. Kodak (Franzen, 2013): ... 2. Tecnick (Asuni & Giachetti, 2014): ... 3. DIV2K (Agustsson & Timofte, 2017): ... 4. COCO2017 (Lin et al., 2014): ... Our model was trained using the well-established Vimeo-90k dataset (Xue et al., 2019)...
Dataset Splits Yes DIV2K (Agustsson & Timofte, 2017): The validation set of this dataset contains 100 high-quality images. We resize the images with the shorter dimension being equal to 768px. Then, each image is center-cropped to a 768 768 squared shape. COCO2017 (Lin et al., 2014): For this dataset, we extract all test images with resolutions higher than 512 512 and resize them to 384 384 resolution to remove compression artifacts. The resulting dataset consists of 2695 images. Our model was trained using the well-established Vimeo-90k dataset (Xue et al., 2019)...
Hardware Specification Yes We run benchmarking on a server with a RTX A6000.
Software Dependencies No The paper mentions the use of the Adam optimizer but does not specify software dependencies like programming language versions or library versions (e.g., Python, PyTorch) used for their implementation.
Experiment Setup Yes The training procedure initiated with a warm-up phase, setting λ to 10 6 and running the model for approximately 700,000 steps. Subsequently, we increased λ to align with the desired bitrates and continued training for an additional 1,000,000 steps until the model reached convergence. For the ϵ-prediction model, our training utilized diffusion process comprising Ntrain = 20, 000 steps. Conversely, the number of diffusion steps for the X-prediction model is Ntrain = 8, 193. We implemented a linear variance schedule to optimize the ϵ-prediction model, while a cosine schedule was selected for the X-prediction model optimization. Throughout the training regime, we maintained a batch size of 4. The Adam optimizer (Kingma & Ba, 2014) was employed to facilitate efficient convergence. We commenced with an initial learning rate of lr = 5 10 5, which was reduced by 20% after every 100,000 steps, ultimately clipped to a learning rate of lr = 2 10 5.