U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
Authors: Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate the extraordinary performance of U-Di T models. |
| Researcher Affiliation | Collaboration | 1 State Key Lab of General AI, School of Intelligence Science and Technology, Peking University. 2 Huawei Noah s Ark Lab. |
| Pseudocode | No | The paper describes methods in prose and includes architectural diagrams (Figure 3) but does not contain a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Codes are available at https://github.com/YuchuanTian/U-DiT. |
| Open Datasets | Yes | The training is conducted with the training set of Image Net 2012 [12]. |
| Dataset Splits | No | The paper states 'The training is conducted with the training set of Image Net 2012 [12]' and evaluates on 'Image Net 256 256' and 'Image Net 512 512', but does not explicitly provide percentages or sample counts for training, validation, and test splits. |
| Hardware Specification | Yes | We used 8 NVIDIA A100s (80G) to train U-Di T-B and U-Di T-L models. |
| Software Dependencies | No | The paper mentions using 'sd-vae-ft-ema', 'Adam W optimizer', 'Mind Spore', 'CANN', and 'Ascend AI Processor', but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The same VAE (i.e. sd-vae-ft-ema) for latent diffusion models [29] and the Adam W optimizer is adopted. The training hyperparameters are kept unchanged, including global batch size 256, learning rate 1e 4, weight decay 0, and global seed 0. |