Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

One-Step Diffusion-Based Image Compression with Semantic Distillation

Authors: Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that One DC achieves SOTA perceptual quality even with one-step generation, offering over 39% bitrate reduction and 20 faster decoding compared to prior multistep diffusion-based codecs. Project: https://onedc-codec.github.io/ Extensive experiments show that One DC achieves SOTA compression performance while offering significantly faster decoding than existing diffusion-based codecs, demonstrating the potential of one-step diffusion in generative compression.
Researcher Affiliation Collaboration Naifu Xue1 , Zhaoyang Jia2 , Jiahao Li3, Bin Li3, Yuan Zhang1, Yan Lu3 1 Communication University of China 2 University of Science and Technology of China 3 Microsoft Research Asia EMAIL, {jzy_ustc}@mail.ustc.edu.cn EMAIL
Pseudocode No The paper describes the methodology in prose and mathematical equations throughout Section 3 'Methodology' and its subsections. There are no explicitly labeled pseudocode or algorithm blocks, nor are there structured steps formatted like code.
Open Source Code Yes Project: https://onedc-codec.github.io/ Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code and models will be publicly available.
Open Datasets Yes We evaluate One DC on several datasets, including Kodak [16], CLIC2020 test set [55], and MS-COCO 30K [35]. Reconstruction fidelity is evaluated using perceptual metrics LPIPS [24] and DISTS [13], along with the traditional metrics PSNR and MS-SSIM [60], while generative realism is measured by the no-reference perceptual metric FID [19]. To enable comprehensive comparison, we further evaluate our method on the DIV2K test set [1] under the full-resolution setting, as shown in Fig. 11.
Dataset Splits Yes To improve high-resolution adaptability, we randomly crop patches of size 512 or 1024 during training. Models are optimized using Adam W [38]. Additional settings are provided in the supplementary material. At the full-resolution setting, we compute FID using overlapping 256 × 256 patches for the CLIC2020 and DIV2K test sets [1], following the protocol of [41]. For the MS-COCO 30K dataset with 512 × 512 images, FID is evaluated on entire images, consistent with [9, 28]. At the resize & center-crop setting, we resize the short side of each image (512 for Kodak, 768 for CLIC2020 test set) and then apply a center crop. In this setting, we use 64 × 64 patches for FID calculation on Kodak and 128 × 128 patches on CLIC2020 test set, consistent with DDCM [46]. During training, image patches of size {512, 1024} are randomly cropped with probabilities of {0.6, 0.4}, respectively. The batch size is set to 32 for 512 × 512 crops and 8 for 1024 × 1024 crops (across 4 GPUs).
Hardware Specification Yes Operation Efficiency. We evaluate the coding times of different methods on 1024 × 1024 images using an A100 GPU.
Software Dependencies No From Appendix C, Model Details: For the U-Net used in ga, we use the implementation from the diffusers library [56]. However, a specific version number for this library is not provided. Other software components like Python, PyTorch, or CUDA are mentioned without version numbers.
Experiment Setup Yes Stage I training. This stage focuses on training the compression module and fine-tuning the one-step diffusion model [66] for the image reconstruction task. The training loss is defined as: Lstage I = Lrecon + λR + αLaux, where Lrecon = L1(x, ˆx) + Lperceptual(x, ˆx) (6) We use the L1 as the pixel-level loss and the LPIPS [24] as the perceptual-level loss. To support various bitrates, the rate-distortion trade-off parameter λ is set to {0.6, 1.0, 1.8, 2.9, 4.6, 7.4, 12.2}. An auxiliary code prediction loss Laux is included with a weighting factor of α = 0.001. We train our model on the dataset introduced in [17]. Training is performed on 4 A100 GPUs for 800,000 steps, using a three-stage learning rate schedule with Adam W [38]: a) 5e-5 for the first 500,000 steps; b) 1e-5 for the next 200,000 steps; c) 1e-6 for the final 100,000 steps. During training, image patches of size {512, 1024} are randomly cropped with probabilities of {0.6, 0.4}, respectively. The batch size is set to 32 for 512 × 512 crops and 8 for 1024 × 1024 crops (across 4 GPUs). Stage II training. The weighting parameters are set as follows: β = 0.625 balance the reconstruction and distillation terms, and γ = 0.001 for the adversarial loss (γ follows [66]).... We uniformly sample t ∈ [20, 640]... The learning rate is fixed at 1e-6 (with Adam W) for the one-step generator, fake network, and discriminator.