Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Autoregressive Motion Generation with Gaussian Mixture-Guided Latent Sampling

Authors: Linnan Tu, Lingwei Meng, Zongyi Li, Hefei Ling, Shijuan Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our model surpasses existing state-of-the-art models in the motion synthesis task.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Huazhong University of Science and Technology 2The Chinese University of Hong Kong EMAIL EMAIL
Pseudocode No The paper includes figures (e.g., Figure 2 and 3) that illustrate the model architecture and processes, but these are diagrams and not structured pseudocode or algorithm blocks. The methodology is described using text and mathematical equations.
Open Source Code No The code will be made publicly available after the review process.
Open Datasets Yes To fairly and accurately compare our method with the baseline, we used two main motionlanguage benchmarks: KITML (34) and Human ML3D (33). The KITML dataset comprises 3,911 actions from KIT motion data, with each action accompanied by one to four text notes (a total of 6,278 notes). The KITML motions are set at 12.5 frames per second (FPS). Human ML3D includes 14,616 actions from the AMASS (48) and Human Act12 (49) datasets. Each action is described by three text scripts (a total of 44,970 notes).
Dataset Splits No We augmented the data by flipping motions and split both datasets into training, testing, and validation sets. However, specific percentages or sample counts for these splits are not provided.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running the experiments in the main text. It mentions in the NeurIPS Paper Checklist that 'We have provided the relevant content in the appendix,' but the appendix is not included in the provided text.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. While the NeurIPS Paper Checklist examples mention 'Python 3.8, PyTorch 1.9, and CUDA 11.1', these are examples of how to report software, not statements from the paper itself regarding its own software environment.
Experiment Setup Yes We use the same CNN-based encoder and decoder as Momask (4). We introduce a linear layer after the encoder (the same as (50)) and replace the vector quantization step with a learnable Gaussian mixture distribution. To maintain training stability, we make the mean learnable, initialize the weights with a uniform distribution, and fix the variance to be the identity matrix. The dimension of the 8-layer Causal Transformer is set to 512, with 8 heads and a dropout rate of 0.1, using the GELU activation function. Learnable Ro PE embeddings are applied. The diagonal covariance matrices are set to be diagonal.