Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Authors: Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Zhao Jin, Dacheng Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to evaluate its effectiveness and design choices using state-of-the-art video Di Ts, including Cog Video X (Yang et al., 2024), Mochi-1 (Team, 2024b), Hunyuan Video (Team, 2024c), and Fast Video (Team, 2024a). With Asym Rn R, these models demonstrate significant acceleration with negligible degradation in video quality and, in some cases, even improve performance as evaluated on VBench (Huang et al., 2024). Quantitative Comparison. Table 1 provides qualitative comparisons between two configurations: a base version with perceptually near-lossless quality and a fast version that achieves higher speed at the cost of slight quality degradation. We set the matching cache step to s = 5 and the partition stride to 2 2 2 for both To Me and Asym Rn R. Our higher VBench scores and lower LPIPS, achieved at comparable FLOPs and latency, demonstrate superior video quality and semantic preservation.
Researcher Affiliation Academia 1College of Computing and Data Science, Nanyang Technological University, Singapore, Singapore 2Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore. Correspondence to: Rong-Cheng Tu <EMAIL>, Dacheng Tao <EMAIL>.
Pseudocode No The paper describes methods using equations and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/ wenhao728/Asym Rn R.
Open Datasets Yes We follow previous work and perform sampling on over 900 text prompts from the standard VBench suite (Huang et al., 2024).
Dataset Splits No The paper mentions sampling on the VBench suite but does not specify any training/test/validation splits for the experiments conducted in the paper.
Hardware Specification Yes Latency is measured using an NVIDIA A100 for Cog Video X variants and an NVIDIA H100 for the rest of models due to the availability of hardware at the time.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes The reduction schedule is defined by two hyperparameters: the similarity threshold for reduction and the reduction rates. The similarity threshold is tuned individually for each Di T model to maintain the quality. ... The reduction rates are adjusted to achieve the desired acceleration (e.g., a 1.30 speedup). All schedule specifications are summarized in Table 8.