reproducibilityindex.ai

MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling

Authors: Weihao Yuan, Yisheng HE, Weichao Shen, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, Qixing Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method significantly outperforms previous methods across different datasets, with a 26.6% decrease of FID on Human ML3D and a 29.9% decrease on KIT-ML.
Researcher Affiliation	Collaboration	1 Alibaba Group 2 The University of Texas at Austin
Pseudocode	No	The paper describes the methodology using text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The code is not included for now. But we will release the code to the public soon.
Open Datasets	Yes	We evaluate our text-to-motion model on Human ML3D [5] and KIT-ML [55] datasets.
Dataset Splits	Yes	Following previous methods [5], 23384/1460/4383 samples are used for train/validation/test in Human ML3D, and 4888/300/830 are used for train/validation/test in KITML.
Hardware Specification	Yes	Our framework is trained on two NVIDIA A100 GPUs with PyTorch.
Software Dependencies	No	Our framework is trained on two NVIDIA A100 GPUs with PyTorch. - Only mentions PyTorch without a specific version number or other versioned libraries.
Experiment Setup	Yes	The batch size is set to 256 and the learning rate is set to 2e-4. To quantize the motion data into our 2D structure, we restructure the pose in the datasets to a joint-based format, with the size of 12 J. The data is then represented by the joint VQ codebook comprised of 256 codes, each with a dimension of 1024. ... The number of residual layers is set to 5... The transformers in our model are all set to have 6 layers, 6 heads, and 384 latent dimensions. The parameter α is set to 1 and N is set to 10.