Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Authors: Liang Peng, Boxi Wu, Haoran Cheng, Yibo Zhao, Xiaofei He

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments with Stable Diffusion 1.5 and Stable Diffusion XL confirm that our method delivers substantial gains. 4 Experiments
Researcher Affiliation Collaboration Liang Peng1 Boxi Wu2 Haoran Cheng2 Yibo Zhao1,2 Xiaofei He1,2 1FABU Inc. 2Zhejiang University
Pseudocode No The paper describes the method narratively in Section 3, 'Self-DPO for Text-to-image Diffusion Models', without presenting any explicit pseudocode or algorithm blocks.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes]
Open Datasets Yes Our training data is sourced from the Pick-a-Pic V2 dataset[45]... We conduct evaluation on three datasets: Pick-a-Pic V2 [45] validation set (contains 500 prompts), Parti Prompts [46] (contains 1632 prompts, including diverse categories and challenge aspects), and HPDv2 [24] (contains 3200 prompts...)
Dataset Splits Yes Our training data is sourced from the Pick-a-Pic V2 dataset[45]... We conduct evaluation on three datasets: Pick-a-Pic V2 [45] validation set (contains 500 prompts)...
Hardware Specification Yes For SD 1.5, a batch size of 2048 pairs (resolution: 512 512) is maintained by training across 4 NVIDIA A100 GPU. Each GPU handles 8 pairs locally with gradient accumulation over 64 steps.
Software Dependencies No The paper mentions optimizers like Adam W [47] and Adafactor [48] but does not specify versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA) within the main text.
Experiment Setup Yes Implement Details: Following Diffusion-DPO [10], for the SD1.5 [1] experiments, Adam W [47] is utilized, while SDXL [2] training is conducted with Adafactor [48] to conserve memory. Following the official implementation in Diffusion-DPO [10], C in Equation 8 is set to 2500. For SD 1.5, a batch size of 2048 pairs (resolution: 512 512) is maintained by training across 4 NVIDIA A100 GPU... For SDXL... we use the total batch size of 96 pairs (resolution: 1024 1024). Training is performed at fixed square resolutions. We use a learning rate of 1e-6 coupled with a 25% linear warmup.