Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models

Authors: Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany, Eunseop Yoon, Chang D. Yoo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experimental Results 4.1 Experimental Setup Datasets and Models. We ﬁne-tune both Stable Diffusion 1.5 [18] (Creativeml-openrail-m License) and SDXL [19] (Openrail++ License) models using the Grad SPO objective, as detailed in Section 3.
Researcher Affiliation	Academia	Korea Advanced Institute of Science and Technology (KAIST) EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code and models are available at https://github.com/Joshua TTJ/Grad SPO.
Open Datasets	Yes	We train the models on 4,000 randomly sampled prompts from the Pick-a-Pic v1 dataset [20], which contains 580,000 pairs of image preference for each prompt. For evaluation, unless stated otherwise, we used the test set consisting of 500 prompts sourced from the Pick-a-Pic v2 dataset, similar to previous work in the ﬁeld [12, 10].
Dataset Splits	Yes	Following the SPO training scheme, we train the models on 4,000 randomly sampled prompts from the Pick-a-Pic v1 dataset [20]... For evaluation, unless stated otherwise, we used the test set consisting of 500 prompts sourced from the Pick-a-Pic v2 dataset, similar to previous work in the ﬁeld [12, 10].
Hardware Specification	Yes	GPU Setup 4x NVIDIA A100
Software Dependencies	No	The paper mentions using Adam W [38] as the optimizer, but does not specify specific versions of software frameworks (e.g., PyTorch, TensorFlow) or other key libraries used for implementation.
Experiment Setup	Yes	Hyperparameters SD 1.5 SDXL Learning rate 6e-5 1e-5 # of epochs 10 10 Batch size 40 16 µ 0.9 0.9 β 10 10 κ [0, 750] [0, 750] Lo RA rank 4 64 cfg during training 5.0 5.0 # of samples per step 4 4 Sampling steps during training 20 20 GPU Setup 4x NVIDIA A100 4x NVIDIA A100; Additionally, the time-dependent weight function αt is set to 1, the guidance scale γt is ﬁxed at 0.5, and the Exponential Moving Average (EMA) decay rate µ is set to 0.9.