FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
Authors: JINGFENG YAO, Cheng Wang, Wenyu Liu, Xinggang Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct numerous experiments and report over one hundred experimental results to empirically summarize a unified accelerating strategy from the perspective of PDF. |
| Researcher Affiliation | Academia | Jingfeng Yao, Cheng Wang, Wenyu Liu, Xinggang Wang School of EIC, Huazhong University of Science and Technology Wuhan 430074, China {jfyao, wangchust, wyliu, xgwang}@hust.edu.cn |
| Pseudocode | Yes | Figure 7: Training Details. Our training pipeline involves only minimal modifications to the code. Algorithm 2 Faster Di T Training |
| Open Source Code | Yes | Open access to the data and code is provided in supplemental material. |
| Open Datasets | Yes | Each experiment was conducted on Image Net [13] at a resolution of 128. We train each model for 100,000 iterations and assess their performance using the FID-10k metric for comparative analysis. |
| Dataset Splits | No | The paper mentions training iterations and evaluation metrics (FID-10k, FID-50k) on ImageNet but does not explicitly provide percentages or counts for training, validation, and test dataset splits. |
| Hardware Specification | Yes | Each experiment has been conducted with 8 H800 GPUs. |
| Software Dependencies | No | The paper lists optimizers and loss functions (e.g., 'Optimizer Adam W', 'Loss Function Lmse, Ld'), but it does not specify software dependencies like libraries, frameworks (e.g., PyTorch, TensorFlow), or their exact version numbers. |
| Experiment Setup | Yes | The specific details of the training processes are delineated in Table 4 and Table 5. Table 4 includes 'Optimizer Adam W', 'Learning Rate 1e-4', 'Global Batchsize 256', 'Training Iterations 100,000', 'Resolution 128', 'Loss Function Lmse', 'Timestep Sampling none/ lognorm(0, 1)/ lognorm(0, 0.5)', 'Data Augmentation none'. |