reproducibilityindex.ai

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive quantitative and qualitative evaluations demonstrate the superiority and effectiveness of our method.
Researcher Affiliation	Collaboration	Jianzong Wu1,3, Xiangtai Li2,3 , Yanhong Zeng3, Jiangning Zhang4, Qianyu Zhou5, Yining Li3, Kai Chen3, Yunhai Tong1 1PKU 2S-Lab, NTU 3Shanghai AI Laboratory 4ZJU 5SJTU
Pseudocode	Yes	Pseudo-code of latent shift. To present the latent shift module more clearly, we show the pseudo-code of the algorithm in Fig. 14.
Open Source Code	No	We are not able to provide the code at submission time. But we are making sure that our code and models will be released publically in the future.
Open Datasets	Yes	For customization, we collect a total of 26 objects from Dream Booth [42] and Custom Diffusion [30].
Dataset Splits	No	The paper describes the datasets used for customization and evaluation, but does not explicitly provide training, validation, and test dataset splits with percentages or sample counts for these datasets.
Hardware Specification	Yes	The training process finishes in around 10 minutes in a single NVIDIA A100 80G GPU.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' and 'DDIM scheduler' but does not specify their version numbers or the versions of other key software dependencies like PyTorch.
Experiment Setup	Yes	We train Motion Booth for 300 steps using the Adam W optimizer, with a learning rate of 5e-2 and a weight decay of 1e-2... The loss weight parameters λ1 and λ2 are set to 1.0 and 0.01. We use Zeroscope and La Vie as base models. During inference, we perform 50-step denoising using the DDIM scheduler and set the classifier-free guidance scale to 7.5. The generated videos are 576x320x24 and 512x320x16 for Zeroscope and La Vie, respectively.