Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
Authors: Yiyang Ma, Wenhan Yang, Jiaying Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods. |
| Researcher Affiliation | Academia | 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Pengcheng Laboratory, Shenzhen, China. Correspondence to: Jiaying Liu <liujiaying@pku.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Encoder Side with DDIM. ... Algorithm 2 Decoder Side with DDIM. |
| Open Source Code | Yes | The project is at https://realpasu.github. io/Corr Diff_Website. |
| Open Datasets | Yes | We train all the models on the dataset of DIV2K (Agustsson & Timofte, 2017) which includes 800 high-resolution images. ... We evaluate our method on 3 datasets: Kodak (Kodak, 2024), CLIC professional (Toderici et al., 2020) and DIV2K-test (Agustsson & Timofte, 2017). |
| Dataset Splits | No | The paper mentions training on DIV2K and testing on DIV2K-test, Kodak, and CLIC professional datasets, but it does not explicitly provide details about a validation dataset split, such as specific percentages or sample counts for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using PyTorch, DISTS, LPIPS, DDIM, and Adam optimizer but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.x' or 'Python 3.x'). |
| Experiment Setup | Yes | We first train only the score network for 400,000 iterations and then train the entire framework for another 400,000 iterations with a batch size of 8, learning rate of 5e-5 and optimizer of Adam (Kingma & Ba, 2014). ... We randomly crop them into 256x256 patches in the training process. |