DiTFastAttn: Attention Compression for Diffusion Transformer Models
Authors: Zhihang Yuan, Hanling Zhang, Lu Pu, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Di TFast Attn using multiple Di T models, including Di T-XL (Peebles & Xie, 2023) and Pix Art-Sigma (Chen et al., 2024) for image generation, and Open-Sora (Open-Sora, 2024) for video generation. Our findings demonstrate that Di TFast Attn consistently reduces the computational cost. Notably, the higher the resolution, the greater the savings in computation and latency. |
| Researcher Affiliation | Collaboration | Zhihang Yuan 1,2 Hanling Zhang 1,2 Pu Lu 1 Xuefei Ning1 Linfeng Zhang3 Tianchen Zhao1,2 Shengen Yan2 Guohao Dai3,2 Yu Wang1 1Tsinghua University 2Infinigence AI 3Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1: Method for Deciding the Compression Plan |
| Open Source Code | No | Project Website: http://nics-effalg.com/Di TFast Attn. The paper refers to a project website but does not provide a direct link to a source-code repository (e.g., GitHub, GitLab, Bitbucket) for the methodology, nor does it explicitly state that code is provided in supplementary material. |
| Open Datasets | Yes | For calculating quality metrics, we use Image Net as the evaluation dataset for Di T and MS-COCO as the evaluation dataset for Pix Art-Sigma. MS-COCO 2014 caption is used as text prompt for Pixart-Sigma models image generation. |
| Dataset Splits | No | The paper states it uses Image Net and MS-COCO for evaluation and generates 50k/30k images for quality metrics, but does not specify explicit train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits) needed for data partitioning. |
| Hardware Specification | Yes | We measure the latency per sample on a single Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions software like 'Flash Attention-2 (Dao, 2023)', 'DPM-Solver', and 'IDDPM', but does not provide specific version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | Yes | To demonstrate compatibility with fast sampling methods, we build our method upon 50-step DPM-Solver for Di T and Pixart-Sigma, and 200-step IDDPM (Nichol & Dhariwal, 2021) for Open-Sora. We use mean relative absolute error for L(O, O ) and experiment with and different thresholds δ at intervals of 0.025. We denote these threshold settings as D1 (δ=0.025), D2 (δ=0.05), ..., D6 (δ=0.15), respectively. We set the window size of WA-RS to 1/8 of the token size. ... a small positive constant ϵ (set to 10 6 in our experiments)... Di T runs with a batch size of 8, while Pix Art-Sigma models with a batch size of 1. |