Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

Authors: Soobin Um, Beomsu Kim, Jong Chul Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the empirical benefits of our approach, we conducted extensive experiments across various real-world benchmarks. Our comprehensive experiments demonstrate that Boost-and-Skip greatly enhances the capability of generating minority samples, even rivaling guidance-based state-of-the-art approaches while requiring significantly fewer computations. 4. Experiments Datasets and pretrained models. Our experiments were conducted on four benchmarks settings with varying resolutions: (i) Celeb A 64 64 (Liu et al., 2015); (ii) LSUN-Bedrooms 256 256 (Yu et al., 2015); (iii) Image Net 64 64 (Deng et al., 2009); and (iv) Image Net 256 256. Table 1: Quantitative comparisons. Table 2: Exploring the design space of Boost-and-Skip.
Researcher Affiliation	Academia	1Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea. Correspondence to: Jong Chul Ye <EMAIL>.
Pseudocode	No	The paper describes methods and processes through mathematical equations and textual explanations, but it does not contain a clearly labeled pseudocode or algorithm block with structured steps.
Open Source Code	Yes	Code is available at https: //github.com/soobin-um/Bn S. Code is available at https://github.com/soobin-um/Bn S.
Open Datasets	Yes	Our experiments were conducted on four benchmarks settings with varying resolutions: (i) Celeb A 64 64 (Liu et al., 2015); (ii) LSUN-Bedrooms 256 256 (Yu et al., 2015); (iii) Image Net 64 64 (Deng et al., 2009); and (iv) Image Net 256 256.
Dataset Splits	No	The paper mentions using a "Celeb A training set" and "Celeb A testset" in the context of data augmentation for downstream classification tasks (Table 4 and its description), and
Hardware Specification	Yes	All measurements are based on a single NVIDIA A100 GPU. Our implementation is based on Py Torch (Paszke et al., 2019), and experiments were performed on twin NVIDIA A100 GPUs.
Software Dependencies	No	Our implementation is based on Py Torch (Paszke et al., 2019), and experiments were performed on twin NVIDIA A100 GPUs. For the EDM (Karras et al., 2022) baseline, we used the checkpoint given in the official project page of (Karras et al., 2022)8. The Di T (Peebles & Xie, 2022) baseline employed the pretrained model provided in the official code repository9.
Experiment Setup	Yes	Our hyperparameter selection (γ, t) followed a two-step approach: first, we determined an appropriate t that ensures a non-negligible α(Tskip) (where Tskip := T t), and then we performed a grid search to select γ. We empirically found that our framework is not that sensitive to the choice of t, and in practice, setting t such that α(Tskip) > 0.01 generally yields strong performance on low-resolution datasets (e.g., Celeb A and Image Net 64 64). For high-resolution benchmarks (e.g., LSUN-Bedrooms), a lower threshold of α(Tskip) > 0.005 was sufficient, as these datasets are more sensitive to noise intensity (Nichol & Dhariwal, 2021). For Celeb A, we conducted a grid search over γ2 = {4.0, 8.0, 12.0, 16.0, 18.0, 20.0}, while for LSUN-Bedrooms, we searched over γ2 = {2.0, 4.0, 6.0, 7.0, 7.5, 8.0}. For the Image Net results, the search was performed over γ2 = {2.0, 4.0, 6.0, 6.5, 7.0, 8.0}. Based on this, we selected the following final values: (i) (γ2, t) = (18.0, 3) for Celeb A; (ii) (γ2, t) = (7.5, 0) for LSUN-Bedrooms; and (iii) (γ2, t) = (6.5, 3) for the Image Net cases. We employed a global setting of 250 timesteps for sampling across all diffusion-based samplers, including both the baseline methods and our approach.