HumanTOMATO: Text-aligned Whole-body Motion Generation
Authors: Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text. |
| Researcher Affiliation | Academia | Tsinghua University International Digital Economy Academy (IDEA) School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ) |
| Pseudocode | Yes | Appendix C.1 contains 'Algorithm 1: Training procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)' and 'Algorithm 2: Inference procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)'. |
| Open Source Code | No | The paper provides a 'Project page: https://lhchen.top/Human TOMATO', but does not explicitly state that the source code for the methodology is provided there, nor does it provide a direct link to a source code repository. |
| Open Datasets | Yes | Motion-X (Lin et al., 2023b) is the largest 3D whole-body motion-text dataset... Human ML3D (Guo et al., 2022) is currently the largest 3D body-only motion-text dataset... |
| Dataset Splits | Yes | We follow Lin et al. (2023b); Guo et al. (2022) to split these datasets into training, validation, and test sets with proportions of 80%, 5%, and 15%. |
| Hardware Specification | Yes | All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4 on 4 NVIDIA Tesla A100-80GB GPUs and are tested on 1 NVIDIA Tesla A100-80GB GPU. |
| Software Dependencies | Yes | We take the Sentence-BERT (aka s BERT2) (Reimers & Gurevych, 2019) as the pre-trained language model, which is more accurate than MPNet. |
| Experiment Setup | Yes | All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4... Training batch size is set to 256 for both H2VQ and Hierarchical-GPT stages. Each experiment is trained for 6,000 epochs during H2VQ stages and 2,000 epochs during Hierarchical-GPT stages. Two codebook sizes are both 512. |