Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
HumanTOMATO: Text-aligned Whole-body Motion Generation
Authors: Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text. |
| Researcher Affiliation | Academia | Tsinghua University International Digital Economy Academy (IDEA) School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ) |
| Pseudocode | Yes | Appendix C.1 contains 'Algorithm 1: Training procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)' and 'Algorithm 2: Inference procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)'. |
| Open Source Code | No | The paper provides a 'Project page: https://lhchen.top/Human TOMATO', but does not explicitly state that the source code for the methodology is provided there, nor does it provide a direct link to a source code repository. |
| Open Datasets | Yes | Motion-X (Lin et al., 2023b) is the largest 3D whole-body motion-text dataset... Human ML3D (Guo et al., 2022) is currently the largest 3D body-only motion-text dataset... |
| Dataset Splits | Yes | We follow Lin et al. (2023b); Guo et al. (2022) to split these datasets into training, validation, and test sets with proportions of 80%, 5%, and 15%. |
| Hardware Specification | Yes | All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4 on 4 NVIDIA Tesla A100-80GB GPUs and are tested on 1 NVIDIA Tesla A100-80GB GPU. |
| Software Dependencies | Yes | We take the Sentence-BERT (aka s BERT2) (Reimers & Gurevych, 2019) as the pre-trained language model, which is more accurate than MPNet. |
| Experiment Setup | Yes | All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4... Training batch size is set to 256 for both H2VQ and Hierarchical-GPT stages. Each experiment is trained for 6,000 epochs during H2VQ stages and 2,000 epochs during Hierarchical-GPT stages. Two codebook sizes are both 512. |