reproducibilityindex.ai

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Authors: Zhihang Yuan, Hanling Zhang, Lu Pu, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Di TFast Attn using multiple Di T models, including Di T-XL (Peebles & Xie, 2023) and Pix Art-Sigma (Chen et al., 2024) for image generation, and Open-Sora (Open-Sora, 2024) for video generation. Our findings demonstrate that Di TFast Attn consistently reduces the computational cost. Notably, the higher the resolution, the greater the savings in computation and latency.
Researcher Affiliation	Collaboration	Zhihang Yuan 1,2 Hanling Zhang 1,2 Pu Lu 1 Xuefei Ning1 Linfeng Zhang3 Tianchen Zhao1,2 Shengen Yan2 Guohao Dai3,2 Yu Wang1 1Tsinghua University 2Infinigence AI 3Shanghai Jiao Tong University
Pseudocode	Yes	Algorithm 1: Method for Deciding the Compression Plan
Open Source Code	No	Project Website: http://nics-effalg.com/Di TFast Attn. The paper refers to a project website but does not provide a direct link to a source-code repository (e.g., GitHub, GitLab, Bitbucket) for the methodology, nor does it explicitly state that code is provided in supplementary material.
Open Datasets	Yes	For calculating quality metrics, we use Image Net as the evaluation dataset for Di T and MS-COCO as the evaluation dataset for Pix Art-Sigma. MS-COCO 2014 caption is used as text prompt for Pixart-Sigma models image generation.
Dataset Splits	No	The paper states it uses Image Net and MS-COCO for evaluation and generates 50k/30k images for quality metrics, but does not specify explicit train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits) needed for data partitioning.
Hardware Specification	Yes	We measure the latency per sample on a single Nvidia A100 GPU.
Software Dependencies	No	The paper mentions software like 'Flash Attention-2 (Dao, 2023)', 'DPM-Solver', and 'IDDPM', but does not provide specific version numbers for these or any other key software components used in the experiments.
Experiment Setup	Yes	To demonstrate compatibility with fast sampling methods, we build our method upon 50-step DPM-Solver for Di T and Pixart-Sigma, and 200-step IDDPM (Nichol & Dhariwal, 2021) for Open-Sora. We use mean relative absolute error for L(O, O ) and experiment with and different thresholds δ at intervals of 0.025. We denote these threshold settings as D1 (δ=0.025), D2 (δ=0.05), ..., D6 (δ=0.15), respectively. We set the window size of WA-RS to 1/8 of the token size. ... a small positive constant ϵ (set to 10 6 in our experiments)... Di T runs with a batch size of 8, while Pix Art-Sigma models with a batch size of 1.