EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching
Authors: Xinwang Chen, Ning Liu, Yichen Zhu, Feifei Feng, Jian Tang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the efficacy of EDT. |
| Researcher Affiliation | Collaboration | Xinwang Chen 1, Ning Liu 1, Yichen Zhu1, Feifei Feng1, Jian Tang 2 1 Midea Group, 2 Beijing Innovation Center of Humanoid Robotics chen_xinwang@xs.ustb.edu.cn, ningliu1220@gmail.com {zhuyc25, feifei.feng}@midea.com, jian.tang@x-humanoid.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at here. |
| Open Datasets | Yes | The training dataset is Image Net [32] with 256 256 and 512 512 resolution. |
| Dataset Splits | No | The paper mentions 'The training dataset is Image Net [32] with 256 256 and 512 512 resolution' but does not explicitly provide training/validation/test dataset splits needed to reproduce the experiment. |
| Hardware Specification | Yes | Training is conducted on eight L40 48GB GPUs, while the speed test for inference is performed on a single L40 48GB GPU. |
| Software Dependencies | No | The paper mentions 'Tensor Flow evaluation suite from ADM [4]' but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | EDT uses the Adan [34] optimizer with a global batch size of 256 and without weight decay. The learning rate linearly decreases from 1e-3 to 5e-5 over 400k iterations. Masking training strategy: We set the mask ratio 0.4-0.5 in the first down-sampling module, and 0.1-0.2 in the second. |