FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification

Authors: JINGFENG YAO, Cheng Wang, Wenyu Liu, Xinggang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct numerous experiments and report over one hundred experimental results to empirically summarize a unified accelerating strategy from the perspective of PDF.
Researcher Affiliation Academia Jingfeng Yao, Cheng Wang, Wenyu Liu, Xinggang Wang School of EIC, Huazhong University of Science and Technology Wuhan 430074, China {jfyao, wangchust, wyliu, xgwang}@hust.edu.cn
Pseudocode Yes Figure 7: Training Details. Our training pipeline involves only minimal modifications to the code. Algorithm 2 Faster Di T Training
Open Source Code Yes Open access to the data and code is provided in supplemental material.
Open Datasets Yes Each experiment was conducted on Image Net [13] at a resolution of 128. We train each model for 100,000 iterations and assess their performance using the FID-10k metric for comparative analysis.
Dataset Splits No The paper mentions training iterations and evaluation metrics (FID-10k, FID-50k) on ImageNet but does not explicitly provide percentages or counts for training, validation, and test dataset splits.
Hardware Specification Yes Each experiment has been conducted with 8 H800 GPUs.
Software Dependencies No The paper lists optimizers and loss functions (e.g., 'Optimizer Adam W', 'Loss Function Lmse, Ld'), but it does not specify software dependencies like libraries, frameworks (e.g., PyTorch, TensorFlow), or their exact version numbers.
Experiment Setup Yes The specific details of the training processes are delineated in Table 4 and Table 5. Table 4 includes 'Optimizer Adam W', 'Learning Rate 1e-4', 'Global Batchsize 256', 'Training Iterations 100,000', 'Resolution 128', 'Loss Function Lmse', 'Timestep Sampling none/ lognorm(0, 1)/ lognorm(0, 0.5)', 'Data Augmentation none'.