HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches. In our quantitative experiments on both the Human ML3D and KIT datasets, Hu Tu Motion significantly outperforms existing state-of-the-art methods. Additionally, through qualitative experiments, we observe that our method generates more natural and semantically correct motions.
Researcher Affiliation Collaboration 1College of Information Engineering, Northwest A&F University 2Tencent AI Lab 3School of Mathematics and Statistics, The University of Melbourne 4Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates
Pseudocode Yes Algorithm 1: Distribution optimization for representative texts
Open Source Code No The paper does not contain an explicit statement that the source code for the methodology is open-source or provides a direct link to a code repository.
Open Datasets Yes We experiment with two text-to-motion synthesis datasets: Human ML3D (Guo et al. 2022b) and KIT (Plappert, Mandery, and Asfour 2016).
Dataset Splits Yes The dataset downsampled to 12.5 FPS, is partitioned into 80% training, 5% validation, and 15% test sets.
Hardware Specification Yes Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU
Software Dependencies No The paper mentions using "MLD (Chen et al. 2023)" and "DDIM (Song, Meng, and Ermon 2020) as the sampler," but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU, with text embedding and latent dimensions set to 768 and 256, respectively. We set σ to 0.2 for latent sampling and use DDIM (Song, Meng, and Ermon 2020) as the denoising motion diffusion sampler. All other settings are consistent with MLD (Chen et al. 2023).