reproducibilityindex.ai

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

Authors: Fuming You, Minghui Fang, Li Tang, Rongjie Huang, Yongqi Wang, Zhou Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Mo Mu-Diffusion surpasses recent state-of-the-art methods both qualitatively and quantitatively, and can synthesize realistic, diverse, long-term, and beat-matched music or motion sequences. We have conducted extensive experiments on three motion-to-music and two music-to-motion datasets, including scenarios such as dancing and competitive sports.
Researcher Affiliation	Academia	Fuming You, Minghui Fang, Li Tang, Rongjie Huang, Yongqi Wang, Zhou Zhao Zhejiang University fumyou13@gmail.com
Pseudocode	Yes	We provide the pseudo-codes of cross-modal generation and multi-modal joint generation in Algorithm 1 and 2, respectively.
Open Source Code	Yes	The generated samples and codes are available at https://momu-diffusion.github.io/.
Open Datasets	Yes	We evaluate our method on the latest LORIS benchmark [49], which contains 86.43 hours of video samples synchronized with music. This benchmark presents three demanding scenarios: AIST++ Dance [30], Floor Exercise [42], and Figure Skating [47, 46]. ... We use two datasets: AIST++ Dance [30] and BHS Dance [26].
Dataset Splits	Yes	In our experiments, each dataset is randomly split with a 90%/5%/5% proportion for training, validation, and testing.
Hardware Specification	Yes	We use 8 NVIDIA 4090 GPUs and it takes about 12 hours to finish. It takes about 2 days for 8 NVIDIA 4090 GPUs.
Software Dependencies	No	Here is the Python code based on the Librosa library: librosa.onset.onset_detect(y=audio, sr=sampling_rate, wait=1, delta=0.2, pre_avg=3, post_avg=3, pre_max=3, post_max=3, units= time ). Open Pose [3] is applied to extract 2D body keypoints. The paper mentions software like Librosa and OpenPose, but does not provide specific version numbers for these software components or other libraries used for experiments.
Experiment Setup	Yes	The detailed hyper-parameters of Bi Co R-VAE are listed in Table 8. The hyper-parameters of our FFT model are listed in Table 9. For training Bi Co R-VAE, we use the Adam W optimizer with a learning rate of 2e-4 and training epochs of 300. The FFT diffusion model is trained by the Adam W optimizer [23] with a learning rate of 1.6e-5 and a lambda linear scheduler with a warmup step of 10000. We train the diffusion model with 200 epochs for each task.