HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback
Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches. In our quantitative experiments on both the Human ML3D and KIT datasets, Hu Tu Motion significantly outperforms existing state-of-the-art methods. Additionally, through qualitative experiments, we observe that our method generates more natural and semantically correct motions. |
| Researcher Affiliation | Collaboration | 1College of Information Engineering, Northwest A&F University 2Tencent AI Lab 3School of Mathematics and Statistics, The University of Melbourne 4Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates |
| Pseudocode | Yes | Algorithm 1: Distribution optimization for representative texts |
| Open Source Code | No | The paper does not contain an explicit statement that the source code for the methodology is open-source or provides a direct link to a code repository. |
| Open Datasets | Yes | We experiment with two text-to-motion synthesis datasets: Human ML3D (Guo et al. 2022b) and KIT (Plappert, Mandery, and Asfour 2016). |
| Dataset Splits | Yes | The dataset downsampled to 12.5 FPS, is partitioned into 80% training, 5% validation, and 15% test sets. |
| Hardware Specification | Yes | Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU |
| Software Dependencies | No | The paper mentions using "MLD (Chen et al. 2023)" and "DDIM (Song, Meng, and Ermon 2020) as the sampler," but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Our representative distribution optimization and semantically guided generation are conducted on a single NVIDIA Ge Force RTX 2080 Ti GPU, with text embedding and latent dimensions set to 768 and 256, respectively. We set σ to 0.2 for latent sampling and use DDIM (Song, Meng, and Ermon 2020) as the denoising motion diffusion sampler. All other settings are consistent with MLD (Chen et al. 2023). |