Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
One-Step Diffusion Distillation via Deep Equilibrium Models
Authors: Zhengyang Geng, Ashwini Pokle, J. Zico Kolter
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of our proposed Generative Equilibrium Transformer (GET) in offline distillation of diffusion models through a series of experiments on single-step class-conditional and unconditional image generation. [...] We report all our results on CIFAR-10 |
| Researcher Affiliation | Collaboration | Zhengyang Geng Carnegie Mellon University EMAIL Ashwini Pokle Carnegie Mellon University EMAIL J. Zico Kolter Carnegie Mellon University Bosch Center for AI EMAIL |
| Pseudocode | No | The paper describes the architecture and process in text and diagrams but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code, checkpoints, and datasets are available here. |
| Open Datasets | Yes | We report all our results on CIFAR-10 [52] |
| Dataset Splits | No | The paper uses CIFAR-10, a standard dataset, but does not explicitly mention training/validation/test splits or a validation set. |
| Hardware Specification | Yes | The entire process of data generation takes about 4 hours on 4 NVIDIA A6000 GPUs using Pytorch [74] Distributed Data Parallel (DDP) and a batch size of 128 per GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch [74]' but does not specify the version number of the software. |
| Experiment Setup | Yes | We use Adam W [63] optimizer with a learning rate of 1e-4, a batch size of 128 (denoted as 1 BS), and 800k training iterations, which are identical to Progressive Distillation (PD) [88]. For conditional models, we adopt a batch size of 256 (2 BS). No warm-up, weight decay, or learning rate decay is applied. We convert input noise to patches of size 2 2. We use 6 steps of fixed point iterations in the forward pass of GET-DEQ and differentiate through it. |