Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Authors: Zhihang Yuan, Hanling Zhang, Lu Pu, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Di TFast Attn using multiple Di T models, including Di T-XL (Peebles & Xie, 2023) and Pix Art-Sigma (Chen et al., 2024) for image generation, and Open-Sora (Open-Sora, 2024) for video generation. Our findings demonstrate that Di TFast Attn consistently reduces the computational cost. Notably, the higher the resolution, the greater the savings in computation and latency. |
| Researcher Affiliation | Collaboration | Zhihang Yuan 1,2 Hanling Zhang 1,2 Pu Lu 1 Xuefei Ning1 Linfeng Zhang3 Tianchen Zhao1,2 Shengen Yan2 Guohao Dai3,2 Yu Wang1 1Tsinghua University 2Infinigence AI 3Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1: Method for Deciding the Compression Plan |
| Open Source Code | No | Project Website: http://nics-effalg.com/Di TFast Attn. The paper refers to a project website but does not provide a direct link to a source-code repository (e.g., GitHub, GitLab, Bitbucket) for the methodology, nor does it explicitly state that code is provided in supplementary material. |
| Open Datasets | Yes | For calculating quality metrics, we use Image Net as the evaluation dataset for Di T and MS-COCO as the evaluation dataset for Pix Art-Sigma. MS-COCO 2014 caption is used as text prompt for Pixart-Sigma models image generation. |
| Dataset Splits | No | The paper states it uses Image Net and MS-COCO for evaluation and generates 50k/30k images for quality metrics, but does not specify explicit train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits) needed for data partitioning. |
| Hardware Specification | Yes | We measure the latency per sample on a single Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions software like 'Flash Attention-2 (Dao, 2023)', 'DPM-Solver', and 'IDDPM', but does not provide specific version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | Yes | To demonstrate compatibility with fast sampling methods, we build our method upon 50-step DPM-Solver for Di T and Pixart-Sigma, and 200-step IDDPM (Nichol & Dhariwal, 2021) for Open-Sora. We use mean relative absolute error for L(O, O ) and experiment with and different thresholds δ at intervals of 0.025. We denote these threshold settings as D1 (δ=0.025), D2 (δ=0.05), ..., D6 (δ=0.15), respectively. We set the window size of WA-RS to 1/8 of the token size. ... a small positive constant ϵ (set to 10 6 in our experiments)... Di T runs with a batch size of 8, while Pix Art-Sigma models with a batch size of 1. |