EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Authors: Xinwang Chen, Ning Liu, Yichen Zhu, Feifei Feng, Jian Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the efficacy of EDT.
Researcher Affiliation Collaboration Xinwang Chen 1, Ning Liu 1, Yichen Zhu1, Feifei Feng1, Jian Tang 2 1 Midea Group, 2 Beijing Innovation Center of Humanoid Robotics chen_xinwang@xs.ustb.edu.cn, ningliu1220@gmail.com {zhuyc25, feifei.feng}@midea.com, jian.tang@x-humanoid.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at here.
Open Datasets Yes The training dataset is Image Net [32] with 256 256 and 512 512 resolution.
Dataset Splits No The paper mentions 'The training dataset is Image Net [32] with 256 256 and 512 512 resolution' but does not explicitly provide training/validation/test dataset splits needed to reproduce the experiment.
Hardware Specification Yes Training is conducted on eight L40 48GB GPUs, while the speed test for inference is performed on a single L40 48GB GPU.
Software Dependencies No The paper mentions 'Tensor Flow evaluation suite from ADM [4]' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes EDT uses the Adan [34] optimizer with a global batch size of 256 and without weight decay. The learning rate linearly decreases from 1e-3 to 5e-5 over 400k iterations. Masking training strategy: We set the mask ratio 0.4-0.5 in the first down-sampling module, and 0.1-0.2 in the second.