Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Authors: Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Jingyi Liao, Zhao Jin, Shunyu Liu, Dacheng Tao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose VORTA, an acceleration framework with two novel components... VORTA achieves an end-to-end speedup 1.76 without loss of quality on VBench. Furthermore, it can seamlessly integrate with various other acceleration methods, such as model caching and step distillation, reaching up to speedup 14.41 with negligible performance degradation... This section presents the evaluation of the text-to-video generation task. We evaluate VORTA with two recently open-sourced VDi Ts: Hunyuan Video [16] and Wan 2.1 [43].
Researcher Affiliation Academia College of Computing and Data Science, Nanyang Technological University, Singapore EMAIL EMAIL
Pseudocode Yes Algorithm 1 3D sliding tiled attention mask
Open Source Code Yes Codes and weights are available at https://github.com/wenhao728/VORTA.
Open Datasets Yes Following prior works [40, 60], we evaluate on the standard VBench prompt suite [11]... For router optimization, we use the Mixkit dataset [20], training for 100 steps with a learning rate of 10^-2 and a batch size of 4.
Dataset Splits No The paper mentions using the VBench prompt suite for evaluation and the Mixkit dataset for router optimization, but does not provide specific train/test/validation splits for these datasets within the main text.
Hardware Specification Yes All experiments are conducted on H100 GPUs with 80GB of memory... We demonstrate this capability by comparing the performance on the Wan 2.1 [43] using both the 50-step Uni PC [59] and 30-step DPM++ [25] schedulers. Table 2 presents the VBench dimensions [11] and efficiency metrics from these experiments, which were conducted on B200 GPUs.
Software Dependencies No We implement VORTA in Py Torch [30], using Flex Attention kernel for sliding attention and Flash Attention [6, 7] kernel for all other attention operations. The paper mentions software like PyTorch, Flex Attention, and Flash Attention, but does not provide specific version numbers for these components.
Experiment Setup Yes For router optimization, we use the Mixkit dataset [20], training for 100 steps with a learning rate of 10^-2 and a batch size of 4. All experiments are conducted on H100 GPUs with 80GB of memory. Additional implementation details are provided in Appendix A.2.