MotionBooth: Motion-Aware Customized Text-to-Video Generation

Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive quantitative and qualitative evaluations demonstrate the superiority and effectiveness of our method.
Researcher Affiliation Collaboration Jianzong Wu1,3, Xiangtai Li2,3 , Yanhong Zeng3, Jiangning Zhang4, Qianyu Zhou5, Yining Li3, Kai Chen3, Yunhai Tong1 1PKU 2S-Lab, NTU 3Shanghai AI Laboratory 4ZJU 5SJTU
Pseudocode Yes Pseudo-code of latent shift. To present the latent shift module more clearly, we show the pseudo-code of the algorithm in Fig. 14.
Open Source Code No We are not able to provide the code at submission time. But we are making sure that our code and models will be released publically in the future.
Open Datasets Yes For customization, we collect a total of 26 objects from Dream Booth [42] and Custom Diffusion [30].
Dataset Splits No The paper describes the datasets used for customization and evaluation, but does not explicitly provide training, validation, and test dataset splits with percentages or sample counts for these datasets.
Hardware Specification Yes The training process finishes in around 10 minutes in a single NVIDIA A100 80G GPU.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'DDIM scheduler' but does not specify their version numbers or the versions of other key software dependencies like PyTorch.
Experiment Setup Yes We train Motion Booth for 300 steps using the Adam W optimizer, with a learning rate of 5e-2 and a weight decay of 1e-2... The loss weight parameters λ1 and λ2 are set to 1.0 and 0.01. We use Zeroscope and La Vie as base models. During inference, we perform 50-step denoising using the DDIM scheduler and set the classifier-free guidance scale to 7.5. The generated videos are 576x320x24 and 512x320x16 for Zeroscope and La Vie, respectively.