MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation

Authors: Nhat M. Hoang, Kehong Gong, Chuan Guo, Michael Bi Mi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several benchmarks demonstrate that our Motion Mix, as a versatile framework, consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
Researcher Affiliation Collaboration 1Huawei Technologies Co., Ltd., 2Nanyang Technological University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes MDM (Tevet et al. 2022) for text-to-motion task on Human ML3D (Guo et al. 2022b), KIT-ML (Plappert, Mandery, and Asfour 2016), as well as action-to-motion task on Human Act12 (Guo et al. 2020) and UESTC (Ji et al. 2018); and EDGE (Tseng, Castellon, and Liu 2022) for music-to-dance task on AIST++ (Li et al. 2021).
Dataset Splits No The paper mentions partitioning the training dataset into 'noisy' and 'clean' subsets but does not specify a separate 'validation' split with percentages or counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used in the experiments.
Experiment Setup Yes On both datasets, we train the MDM and Motion Diffuse models from scratch for 700K and 200K steps, respectively. To approximate the noisy motion data x from x RN D, we use noisy ranges [20, 60] and [20, 40] for Human ML3D and KIT-ML, respectively. Following the experimental setup by Tevet et al., we train the MDM (Motion Mix) from scratch on the Human Act12 and UESTC datasets for 750K and 2M steps, respectively. Following the setup of Tseng, Castellon, and Liu, we train both the EDGE model and our EDGE (Motion Mix) from scratch on AIST++ for 2000 epochs.