reproducibilityindex.ai

HumanTOMATO: Text-aligned Whole-body Motion Generation

Authors: Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text.
Researcher Affiliation	Academia	Tsinghua University International Digital Economy Academy (IDEA) School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ)
Pseudocode	Yes	Appendix C.1 contains 'Algorithm 1: Training procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)' and 'Algorithm 2: Inference procedure of Holistic Hierarchical VQ-VAE (H2VQ-VAE)'.
Open Source Code	No	The paper provides a 'Project page: https://lhchen.top/Human TOMATO', but does not explicitly state that the source code for the methodology is provided there, nor does it provide a direct link to a source code repository.
Open Datasets	Yes	Motion-X (Lin et al., 2023b) is the largest 3D whole-body motion-text dataset... Human ML3D (Guo et al., 2022) is currently the largest 3D body-only motion-text dataset...
Dataset Splits	Yes	We follow Lin et al. (2023b); Guo et al. (2022) to split these datasets into training, validation, and test sets with proportions of 80%, 5%, and 15%.
Hardware Specification	Yes	All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4 on 4 NVIDIA Tesla A100-80GB GPUs and are tested on 1 NVIDIA Tesla A100-80GB GPU.
Software Dependencies	Yes	We take the Sentence-BERT (aka s BERT2) (Reimers & Gurevych, 2019) as the pre-trained language model, which is more accurate than MPNet.
Experiment Setup	Yes	All our experiments are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a fixed learning rate of 10 4... Training batch size is set to 256 for both H2VQ and Hierarchical-GPT stages. Each experiment is trained for 6,000 epochs during H2VQ stages and 2,000 epochs during Hierarchical-GPT stages. Two codebook sizes are both 512.