Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Inductive Biases That Enable Generalization in Diffusion Transformers

Authors: Jie An, De Wang, Pengsheng Guo, Jiebo Luo, Alex Schwing

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Celeb A, Image Net, MSCOCO, and LSUN data show that strengthening the inductive bias of a Di T can improve both generalization and generation quality when less training data is available.
Researcher Affiliation Collaboration Jie An1,2 , De Wang1, Pengsheng Guo1, Jiebo Luo2, Alexander G. Schwing1 1Apple, 2University of Rochester EMAIL EMAIL
Pseudocode No The paper describes methods using mathematical equations and narrative text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Source code is available at https://github. com/Di T-Generalization/Di T-Generalization.
Open Datasets Yes Experimental results on the Celeb A, Image Net, MSCOCO, and LSUN data show that strengthening the inductive bias of a Di T can improve both generalization and generation quality when less training data is available. All datasets used in this paper are publicly available.
Dataset Splits Yes To elaborate, given K images from either training or testing set, we first feed noisy images at step t to diffusion models and obtain the estimated noise ωˆ. Next, we compute the one-step denoising result ˆx0 via Eq. (3). Finally, we derive the training and testing PSNRs at diffusion step t as follows: PSNR (t) = 1 K k=1 10 log10 (M2 MSE(x0k, ˆx0k))... We have clearly presented all experimental settings in Appendix A.
Hardware Specification Yes In Appendix A, we report that all models were trained using 4 or 8 A100/H100 GPUs, and all checkpoints were taken at 400k training steps.
Software Dependencies No Our implementation builds on the official Di T codebase, which is also publicly accessible. Specific version numbers for software dependencies are not provided in the paper's main text or accompanying checklist justifications.
Experiment Setup Yes We train both the vanilla Di T and a Di T equipped with local attentions with N=103, 104 and 105 images for the same 400k training steps. We have clearly presented all experimental settings in Appendix A.