Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Data-free Distillation of Diffusion Models with Bootstrapping
Authors: Jiatao Gu, Chen Wang, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Joshua M. Susskind
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we first demonstrate the efficacy of BOOT on various challenging image generation benchmarks, including unconditional and class-conditional settings. Next, we show that the proposed method can be easily adopted to distill text-to-image diffusion models. |
| Researcher Affiliation | Collaboration | 1Apple 2University of Pennsylvania. Correspondence to: Jiatao Gu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Distillation using BOOT for Conditional Diffusion Models. |
| Open Source Code | No | The paper mentions using open-sourced models as teachers but does not provide any statement or link indicating that the code for their proposed method (BOOT) is open-source or publicly available. |
| Open Datasets | Yes | FFHQ (https://github.com/NVlabs/ffhq-dataset) contains 70k images of real human faces in resolution of 1024 1024. ... Image Net-1K (https://image-net.org/download.php) contains 1.28M images across 1000 classes. ... Specifically, we utilize diffusiondb (Wang et al., 2022), a large-scale prompt dataset that contains 14 million images generated by Stable Diffusion using prompts provided by real users. ... Diffusion DB (https://poloclub.github.io/diffusiondb/) contains 14M images generated by Stable Diffusion using prompts and hyperparameters specified by users. |
| Dataset Splits | Yes | For text-to-image tasks, we measure the zero-shot CLIP score (Radford et al., 2021) for measuring the faithfulness of generation given 5000 randomly sampled captions from COCO2017 (Lin et al., 2014) validation set. |
| Hardware Specification | Yes | In addition, we report the speed by fps on a single A100 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., "Python 3.8, PyTorch 1.9") needed to replicate the experiment. |
| Experiment Setup | Yes | Table 3. Hyperparameters used for training BOOT. The table includes specific details such as Denosing resolution (e.g., 64x64), Base channels (e.g., 128), Multipliers (e.g., 1,2,3,4), Bootstrapping step size (e.g., 0.04), CFG weight (e.g., 1, 5), Learning rate (e.g., 1e-4), Batch size (e.g., 128), and Training iterations (e.g., 500k). |