Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation
Authors: Soobin Um, Beomsu Kim, Jong Chul Ye
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the empirical benefits of our approach, we conducted extensive experiments across various real-world benchmarks. Our comprehensive experiments demonstrate that Boost-and-Skip greatly enhances the capability of generating minority samples, even rivaling guidance-based state-of-the-art approaches while requiring significantly fewer computations. 4. Experiments Datasets and pretrained models. Our experiments were conducted on four benchmarks settings with varying resolutions: (i) Celeb A 64 64 (Liu et al., 2015); (ii) LSUN-Bedrooms 256 256 (Yu et al., 2015); (iii) Image Net 64 64 (Deng et al., 2009); and (iv) Image Net 256 256. Table 1: Quantitative comparisons. Table 2: Exploring the design space of Boost-and-Skip. |
| Researcher Affiliation | Academia | 1Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea. Correspondence to: Jong Chul Ye <EMAIL>. |
| Pseudocode | No | The paper describes methods and processes through mathematical equations and textual explanations, but it does not contain a clearly labeled pseudocode or algorithm block with structured steps. |
| Open Source Code | Yes | Code is available at https: //github.com/soobin-um/Bn S. Code is available at https://github.com/soobin-um/Bn S. |
| Open Datasets | Yes | Our experiments were conducted on four benchmarks settings with varying resolutions: (i) Celeb A 64 64 (Liu et al., 2015); (ii) LSUN-Bedrooms 256 256 (Yu et al., 2015); (iii) Image Net 64 64 (Deng et al., 2009); and (iv) Image Net 256 256. |
| Dataset Splits | No | The paper mentions using a "Celeb A training set" and "Celeb A testset" in the context of data augmentation for downstream classification tasks (Table 4 and its description), and |
| Hardware Specification | Yes | All measurements are based on a single NVIDIA A100 GPU. Our implementation is based on Py Torch (Paszke et al., 2019), and experiments were performed on twin NVIDIA A100 GPUs. |
| Software Dependencies | No | Our implementation is based on Py Torch (Paszke et al., 2019), and experiments were performed on twin NVIDIA A100 GPUs. For the EDM (Karras et al., 2022) baseline, we used the checkpoint given in the official project page of (Karras et al., 2022)8. The Di T (Peebles & Xie, 2022) baseline employed the pretrained model provided in the official code repository9. |
| Experiment Setup | Yes | Our hyperparameter selection (γ, t) followed a two-step approach: first, we determined an appropriate t that ensures a non-negligible α(Tskip) (where Tskip := T t), and then we performed a grid search to select γ. We empirically found that our framework is not that sensitive to the choice of t, and in practice, setting t such that α(Tskip) > 0.01 generally yields strong performance on low-resolution datasets (e.g., Celeb A and Image Net 64 64). For high-resolution benchmarks (e.g., LSUN-Bedrooms), a lower threshold of α(Tskip) > 0.005 was sufficient, as these datasets are more sensitive to noise intensity (Nichol & Dhariwal, 2021). For Celeb A, we conducted a grid search over γ2 = {4.0, 8.0, 12.0, 16.0, 18.0, 20.0}, while for LSUN-Bedrooms, we searched over γ2 = {2.0, 4.0, 6.0, 7.0, 7.5, 8.0}. For the Image Net results, the search was performed over γ2 = {2.0, 4.0, 6.0, 6.5, 7.0, 8.0}. Based on this, we selected the following final values: (i) (γ2, t) = (18.0, 3) for Celeb A; (ii) (γ2, t) = (7.5, 0) for LSUN-Bedrooms; and (iii) (γ2, t) = (6.5, 3) for the Image Net cases. We employed a global setting of 250 timesteps for sampling across all diffusion-based samplers, including both the baseline methods and our approach. |