reproducibilityindex.ai

HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches. In our quantitative experiments on both the Human ML3D and KIT datasets, Hu Tu Motion significantly outperforms existing state-of-the-art methods. Additionally, through qualitative experiments, we observe that our method generates more natural and semantically correct motions.
Researcher Affiliation	Collaboration	1College of Information Engineering, Northwest A&F University 2Tencent AI Lab 3School of Mathematics and Statistics, The University of Melbourne 4Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates
Pseudocode	Yes	Algorithm 1: Distribution optimization for representative texts
Open Source Code	No	The paper does not contain an explicit statement that the source code for the methodology is open-source or provides a direct link to a code repository.
Open Datasets	Yes	We experiment with two text-to-motion synthesis datasets: Human ML3D (Guo et al. 2022b) and KIT (Plappert, Mandery, and Asfour 2016).
Dataset Splits	Yes	The dataset downsampled to 12.5 FPS, is partitioned into 80% training, 5% validation, and 15% test sets.
Hardware Specification	Yes	Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU
Software Dependencies	No	The paper mentions using "MLD (Chen et al. 2023)" and "DDIM (Song, Meng, and Ermon 2020) as the sampler," but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU, with text embedding and latent dimensions set to 768 and 256, respectively. We set σ to 0.2 for latent sampling and use DDIM (Song, Meng, and Ermon 2020) as the denoising motion diffusion sampler. All other settings are consistent with MLD (Chen et al. 2023).