reproducibilityindex.ai

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

Authors: Zheng Zhan, Yushu Wu, Yifan Gong, Zichong Meng, Zhenglun Kong, Changdi Yang, Geng Yuan, Pu Zhao, Wei Niu, Yanzhi Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our approach significantly reduces peak memory and computational overhead, making it feasible to generate high-quality videos on a single consumer GPU (e.g., reducing peak memory of Animate Diff from 42GB to 11GB, featuring faster inference on 2080Ti)1.
Researcher Affiliation	Academia	Zheng Zhan1 Yushu Wu1 Yifan Gong1 Zichong Meng1 Zhenglun Kong12 Changdi Yang1 Geng Yuan3 Pu Zhao1 Wei Niu3 Yanzhi Wang1 1Northeastern University 2Harvard University 3University of Georgia
Pseudocode	Yes	Algorithm 1 Key step search in step rehash
Open Source Code	Yes	1Code available at: https://github.com/wuyushuwys/FMEDiffusion
Open Datasets	Yes	Zero-shot UCF-101 [33]: We sample clips from each categories of UCF-101 dataset, and gather a subset with 1,000 video clips for evaluation. Their action categories are considered as their captions. For SVD and SVD-XT, our samples are generated at a resolution of 576 1024 (14 frames for SVD and 25 frames for SVD-XT) and then resize to 240 320. For Animate Diff, we generate samples with resolution 512 512 (16 frames). Zero-shot MSR-VTT [41]: We generated a video sample for each of the 9,940 development prompts.
Dataset Splits	No	The paper describes the datasets used for evaluation but does not specify explicit training/validation/test splits, percentages, or a clear methodology for data partitioning for reproducibility.
Hardware Specification	Yes	All experiments are conducted on a NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions 'Torch Metrics [1]' and implies the use of PyTorch (via a tutorial link), but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use pretained weight for SVD (I2V) and Animate Diff (T2V). We compare the proposed Streamlined Inference (use 13 full computation steps) with the original inference (use 25 full computation steps) and naïve slicing inference as mentioned in Sec.3.