Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Authors: Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Jingyi Liao, Zhao Jin, Shunyu Liu, Dacheng Tao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose VORTA, an acceleration framework with two novel components... VORTA achieves an end-to-end speedup 1.76 without loss of quality on VBench. Furthermore, it can seamlessly integrate with various other acceleration methods, such as model caching and step distillation, reaching up to speedup 14.41 with negligible performance degradation... This section presents the evaluation of the text-to-video generation task. We evaluate VORTA with two recently open-sourced VDi Ts: Hunyuan Video [16] and Wan 2.1 [43]. |
| Researcher Affiliation | Academia | College of Computing and Data Science, Nanyang Technological University, Singapore EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 3D sliding tiled attention mask |
| Open Source Code | Yes | Codes and weights are available at https://github.com/wenhao728/VORTA. |
| Open Datasets | Yes | Following prior works [40, 60], we evaluate on the standard VBench prompt suite [11]... For router optimization, we use the Mixkit dataset [20], training for 100 steps with a learning rate of 10^-2 and a batch size of 4. |
| Dataset Splits | No | The paper mentions using the VBench prompt suite for evaluation and the Mixkit dataset for router optimization, but does not provide specific train/test/validation splits for these datasets within the main text. |
| Hardware Specification | Yes | All experiments are conducted on H100 GPUs with 80GB of memory... We demonstrate this capability by comparing the performance on the Wan 2.1 [43] using both the 50-step Uni PC [59] and 30-step DPM++ [25] schedulers. Table 2 presents the VBench dimensions [11] and efficiency metrics from these experiments, which were conducted on B200 GPUs. |
| Software Dependencies | No | We implement VORTA in Py Torch [30], using Flex Attention kernel for sliding attention and Flash Attention [6, 7] kernel for all other attention operations. The paper mentions software like PyTorch, Flex Attention, and Flash Attention, but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | For router optimization, we use the Mixkit dataset [20], training for 100 steps with a learning rate of 10^-2 and a batch size of 4. All experiments are conducted on H100 GPUs with 80GB of memory. Additional implementation details are provided in Appendix A.2. |