Data-free Distillation of Diffusion Models with Bootstrapping
Authors: Jiatao Gu, Chen Wang, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Joshua M. Susskind
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we first demonstrate the efficacy of BOOT on various challenging image generation benchmarks, including unconditional and class-conditional settings. Next, we show that the proposed method can be easily adopted to distill text-to-image diffusion models. |
| Researcher Affiliation | Collaboration | 1Apple 2University of Pennsylvania. Correspondence to: Jiatao Gu <jiatao@apple.com>. |
| Pseudocode | Yes | Algorithm 1 Distillation using BOOT for Conditional Diffusion Models. |
| Open Source Code | No | The paper mentions using open-sourced models as teachers but does not provide any statement or link indicating that the code for their proposed method (BOOT) is open-source or publicly available. |
| Open Datasets | Yes | FFHQ (https://github.com/NVlabs/ffhq-dataset) contains 70k images of real human faces in resolution of 1024 1024. ... Image Net-1K (https://image-net.org/download.php) contains 1.28M images across 1000 classes. ... Specifically, we utilize diffusiondb (Wang et al., 2022), a large-scale prompt dataset that contains 14 million images generated by Stable Diffusion using prompts provided by real users. ... Diffusion DB (https://poloclub.github.io/diffusiondb/) contains 14M images generated by Stable Diffusion using prompts and hyperparameters specified by users. |
| Dataset Splits | Yes | For text-to-image tasks, we measure the zero-shot CLIP score (Radford et al., 2021) for measuring the faithfulness of generation given 5000 randomly sampled captions from COCO2017 (Lin et al., 2014) validation set. |
| Hardware Specification | Yes | In addition, we report the speed by fps on a single A100 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., "Python 3.8, PyTorch 1.9") needed to replicate the experiment. |
| Experiment Setup | Yes | Table 3. Hyperparameters used for training BOOT. The table includes specific details such as Denosing resolution (e.g., 64x64), Base channels (e.g., 128), Multipliers (e.g., 1,2,3,4), Bootstrapping step size (e.g., 0.04), CFG weight (e.g., 1, 5), Learning rate (e.g., 1e-4), Batch size (e.g., 128), and Training iterations (e.g., 500k). |