Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Authors: Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SVG2 s quality and efficiency on representative video generative models including Hunyuan Video [1] and Wan 2.1 [2]. (Section 1, page 2). And "5 Experiment" (Section 5, page 8). |
| Researcher Affiliation | Collaboration | University of California, Berkeley MIT NVIDIA Stanford University |
| Pseudocode | No | The paper describes the methodology using mathematical equations and descriptive text, but does not include a distinct pseudocode block or algorithm section. |
| Open Source Code | Yes | Our code is open-sourced at https://github.com/svg-project/Sparse-Video Gen. |
| Open Datasets | Yes | For text-to-video generation, we adopt the prompt in Penguin Benchmark after prompt optimization provided by VBench team. For image-to-video generation, we adopt the prompt-image pairs provided by VBench [67] and crop images to 16 : 9 ratios for 720p resolution. |
| Dataset Splits | No | The paper mentions using prompt in Penguin Benchmark and prompt-image pairs provided by VBench [67] for evaluation, but does not explicitly detail the dataset splits (e.g., train/test/validation percentages or sample counts) used for the experiments. |
| Hardware Specification | Yes | On a single H100, for Hunyuan Video and Wan 2.1, SVG2 achieves up to 2.30 and 1.89 end-to-end speedup... (Abstract, page 2). And "We prototype SVG2 as an end-to-end framework with customized kernels from Flash Infer [16] and benchmark on NVIDIA H100 GPU with CUDA 12.8." (Section 5.1, page 8). |
| Software Dependencies | Yes | We prototype SVG2 as an end-to-end framework with customized kernels from Flash Infer [16] and benchmark on NVIDIA H100 GPU with CUDA 12.8. |
| Experiment Setup | Yes | For SVG2, we choose Cq = 100 and Ck = 500, and explain the choice in D. To showcase the trade-off between generation quality and efficiency, we evaluate on various accuracy target (i.e., attention score recall) as detailed in 5.4. We also sample a single data point for detailed comparison as shown in Table 1. We conduct experiments with sparse attention skipped during the first 30% of denoising steps for all methods, as these steps are critical for generation quality. following previous work [64, 68, 56, 59]. |