Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Authors: Yuxuan Bian, Ailing Zeng, Xuan Ju, Xian Liu, Zhaoyang Zhang, Wei Liu, Qiang Xu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Motion Craft achieves state-of-the-art performance on various standard motion generation tasks. Through detailed ablation studies, we provide key insights into architectural design choices and scaling effects for future multimodal whole-body motion generation models. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong, Hong Kong SAR, China 2Tencent, Guangdong Province, China |
| Pseudocode | No | The paper describes the architecture and method design in narrative text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://cure-lab.github.io/Motion Craft |
| Open Datasets | Yes | To overcome the motion format inconsistency of existing benchmarks, we introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format. We create MC-Bench, the first publicly available multimodal whole-body motion generation benchmark with a unified whole-body motion representation SMPL-X. To prevent the information loss when aligning different motion formats, we select Human ML3D (Guo et al. 2022) in SMPL format for T2M, Fine Dance (Li et al. 2023) in SMPLH Rot-6D format for M2D, and BEAT2 (Liu et al. 2024a) in SMPL-X format for S2G from public datasets, as they are the most representative unimodal datasets in their respective areas. |
| Dataset Splits | No | The paper mentions using specific datasets for training (Human ML3D, BEAT2, Fine Dance) and discusses evaluation metrics, but it does not provide explicit training/test/validation split percentages, sample counts, or references to predefined splits with specific details in the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions using frameworks or representations like SMPL-X, Rot6D, SMPL, FLAME, and Open TMR, but does not provide specific version numbers for any software libraries, tools, or programming languages used. |
| Experiment Setup | Yes | We designed two model variants for the first stage of Text-to Motion backbone training, Motion Craft-Basic and Motion Craft-Mix, which were trained on the Human ML3D subset in MC-Bench and the entire MC-Bench, respectively. Motion Craft-Basic and Motion Craft-Mix share the same 4-layer transformer backbone configuration, dividing the body topology into 12 parts, each with a body-part hidden encoding dimension of 64. In the second stage, we used BEAT2 (Liu et al. 2024a), a large dataset for speech gesture synthesis, and Fine Dance (Li et al. 2023), a high-quality choreography dataset, to train control branches for Speech-to-Gesture and Music-to-Dance. MC-Bench used a unified whole-body motion format SMPL-X (Pavlakos et al. 2019) in the form of axis-angle, instead of the joint positions or 6D rotation. Thus we retrained the motion and text encoder based on SMPL-X using Open TMR (Lu et al. 2023) for evaluation. |