Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models

Authors: Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany, Eunseop Yoon, Chang D. Yoo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental Results 4.1 Experimental Setup Datasets and Models. We fine-tune both Stable Diffusion 1.5 [18] (Creativeml-openrail-m License) and SDXL [19] (Openrail++ License) models using the Grad SPO objective, as detailed in Section 3.
Researcher Affiliation Academia Korea Advanced Institute of Science and Technology (KAIST) EMAIL
Pseudocode No The paper describes methods using mathematical formulations and textual explanations but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code and models are available at https://github.com/Joshua TTJ/Grad SPO.
Open Datasets Yes We train the models on 4,000 randomly sampled prompts from the Pick-a-Pic v1 dataset [20], which contains 580,000 pairs of image preference for each prompt. For evaluation, unless stated otherwise, we used the test set consisting of 500 prompts sourced from the Pick-a-Pic v2 dataset, similar to previous work in the field [12, 10].
Dataset Splits Yes Following the SPO training scheme, we train the models on 4,000 randomly sampled prompts from the Pick-a-Pic v1 dataset [20]... For evaluation, unless stated otherwise, we used the test set consisting of 500 prompts sourced from the Pick-a-Pic v2 dataset, similar to previous work in the field [12, 10].
Hardware Specification Yes GPU Setup 4x NVIDIA A100
Software Dependencies No The paper mentions using Adam W [38] as the optimizer, but does not specify specific versions of software frameworks (e.g., PyTorch, TensorFlow) or other key libraries used for implementation.
Experiment Setup Yes Hyperparameters SD 1.5 SDXL Learning rate 6e-5 1e-5 # of epochs 10 10 Batch size 40 16 µ 0.9 0.9 β 10 10 κ [0, 750] [0, 750] Lo RA rank 4 64 cfg during training 5.0 5.0 # of samples per step 4 4 Sampling steps during training 20 20 GPU Setup 4x NVIDIA A100 4x NVIDIA A100; Additionally, the time-dependent weight function αt is set to 1, the guidance scale γt is fixed at 0.5, and the Exponential Moving Average (EMA) decay rate µ is set to 0.9.