One-Step Diffusion Distillation via Deep Equilibrium Models

Authors: Zhengyang Geng, Ashwini Pokle, J. Zico Kolter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of our proposed Generative Equilibrium Transformer (GET) in offline distillation of diffusion models through a series of experiments on single-step class-conditional and unconditional image generation. [...] We report all our results on CIFAR-10
Researcher Affiliation Collaboration Zhengyang Geng Carnegie Mellon University zgeng2@cs.cmu.edu Ashwini Pokle Carnegie Mellon University apokle@cs.cmu.edu J. Zico Kolter Carnegie Mellon University Bosch Center for AI zkolter@cs.cmu.edu
Pseudocode No The paper describes the architecture and process in text and diagrams but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Code, checkpoints, and datasets are available here.
Open Datasets Yes We report all our results on CIFAR-10 [52]
Dataset Splits No The paper uses CIFAR-10, a standard dataset, but does not explicitly mention training/validation/test splits or a validation set.
Hardware Specification Yes The entire process of data generation takes about 4 hours on 4 NVIDIA A6000 GPUs using Pytorch [74] Distributed Data Parallel (DDP) and a batch size of 128 per GPU.
Software Dependencies No The paper mentions 'PyTorch [74]' but does not specify the version number of the software.
Experiment Setup Yes We use Adam W [63] optimizer with a learning rate of 1e-4, a batch size of 128 (denoted as 1 BS), and 800k training iterations, which are identical to Progressive Distillation (PD) [88]. For conditional models, we adopt a batch size of 256 (2 BS). No warm-up, weight decay, or learning rate decay is applied. We convert input noise to patches of size 2 2. We use 6 steps of fixed point iterations in the forward pass of GET-DEQ and differentiate through it.