HumanTOMATO: Text-aligned Whole-body Motion Generation

Authors: Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text.
Researcher Affiliation Academia Tsinghua University International Digital Economy Academy (IDEA) School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ)
Pseudocode Yes Appendix C.1 contains 'Algorithm 1: Training procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)' and 'Algorithm 2: Inference procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)'.
Open Source Code No The paper provides a 'Project page: https://lhchen.top/Human TOMATO', but does not explicitly state that the source code for the methodology is provided there, nor does it provide a direct link to a source code repository.
Open Datasets Yes Motion-X (Lin et al., 2023b) is the largest 3D whole-body motion-text dataset... Human ML3D (Guo et al., 2022) is currently the largest 3D body-only motion-text dataset...
Dataset Splits Yes We follow Lin et al. (2023b); Guo et al. (2022) to split these datasets into training, validation, and test sets with proportions of 80%, 5%, and 15%.
Hardware Specification Yes All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4 on 4 NVIDIA Tesla A100-80GB GPUs and are tested on 1 NVIDIA Tesla A100-80GB GPU.
Software Dependencies Yes We take the Sentence-BERT (aka s BERT2) (Reimers & Gurevych, 2019) as the pre-trained language model, which is more accurate than MPNet.
Experiment Setup Yes All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4... Training batch size is set to 256 for both H2VQ and Hierarchical-GPT stages. Each experiment is trained for 6,000 epochs during H2VQ stages and 2,000 epochs during Hierarchical-GPT stages. Two codebook sizes are both 512.