reproducibilityindex.ai

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, DAVID DOERMANN, Junsong Yuan, Lijuan Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our MCM achieves the state-of-the-art video diffusion distillation performance. Additionally, our method can enhance frame quality in video diffusion models, producing frames with high aesthetic scores or specific styles without corresponding video data. ... We conduct extensive experiments, demonstrating that our MCM significantly improves video diffusion distillation performance. Furthermore, when leveraging an additional image dataset, our MCM better aligns the appearance of the generated video with the high-quality image dataset.
Researcher Affiliation	Collaboration	State University of New York at Buffalo Microsoft {yzhai6,doermann,jsyuan}@buffalo.edu {keli,zhengyang,lindsey.li,jianfw,chungching.lin,lijuanw}@microsoft.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'Our model, code, and data will be made public available upon acceptance.' This indicates future release, not concrete access at the time of publication.
Open Datasets	Yes	We choose two text-to-video diffusion models for experiments: Model Scope T2V [62] and Animate Diff [19] with Stable Diffusion v1.5 [45]. We use the Web Vid 2M [5] as both the video and image training dataset, without using any additional image datasets.
Dataset Splits	No	The paper mentions training, validation, and test sets (e.g., 'we randomly sample 500 validation videos from Web Vid 2M (Web Vid mini) for in-distribution evaluation; we also follow common practice [62, 16] to use 2900 validation videos from MSRVTT [70] for zero-shot generation evaluation.'), but it does not specify the exact split percentages or absolute counts for training, validation, and test splits in a way that would allow direct reproduction of the data partitioning.
Hardware Specification	Yes	The experiments are conducted on a machine equipped with 32 H100 GPUs.
Software Dependencies	No	The paper lists software like 'PyTorch' [4], 'Diffusers' [60], and 'PEFT' [40] but does not specify their version numbers.
Experiment Setup	Yes	The learning rates for the diffusion model and discriminator are set to 5e 6 and 5e 5, respectively, with batch size 128, Adam optimizer [30], and 30k training steps. The weight hyperparameters are determined via a grid search: λadv = 1 and λreal = 0.5.