reproducibilityindex.ai

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Authors: Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to demonstrate the extraordinary performance of U-Di T models.
Researcher Affiliation	Collaboration	1 State Key Lab of General AI, School of Intelligence Science and Technology, Peking University. 2 Huawei Noah s Ark Lab.
Pseudocode	No	The paper describes methods in prose and includes architectural diagrams (Figure 3) but does not contain a formal pseudocode or algorithm block.
Open Source Code	Yes	Codes are available at https://github.com/YuchuanTian/U-DiT.
Open Datasets	Yes	The training is conducted with the training set of Image Net 2012 [12].
Dataset Splits	No	The paper states 'The training is conducted with the training set of Image Net 2012 [12]' and evaluates on 'Image Net 256 256' and 'Image Net 512 512', but does not explicitly provide percentages or sample counts for training, validation, and test splits.
Hardware Specification	Yes	We used 8 NVIDIA A100s (80G) to train U-Di T-B and U-Di T-L models.
Software Dependencies	No	The paper mentions using 'sd-vae-ft-ema', 'Adam W optimizer', 'Mind Spore', 'CANN', and 'Ascend AI Processor', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	The same VAE (i.e. sd-vae-ft-ema) for latent diffusion models [29] and the Adam W optimizer is adopted. The training hyperparameters are kept unchanged, including global batch size 256, learning rate 1e 4, weight decay 0, and global seed 0.