Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval

Authors: Jiahang Zhang, Lilang Lin, Shuai Yang, Jiaying Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three benchmarks, including motion-text retrieval as well as recognition and generation applications, demonstrate the superior performance and promising transferability of our method.
Researcher Affiliation Academia Jiahang Zhang Lilang Lin Shuai Yang Jiaying Liu Wangxuan Institute of Computer Technology, Peking University EMAIL
Pseudocode No The paper describes the proposed method in Section 3 using textual descriptions and mathematical formulations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No Code and models will be available upon publication.
Open Datasets Yes Human ML3D [8] (HML3D) ... It consists of 23,384 and 4,380 motions for training and testing with a mirror augmentation, respectively. KIT-ML [26] (KIT) is a small dataset with a focus on locomotion motions... Motion-X [15] is a large-scale dataset... BABEL [27] consisting of 10892 sequences from AMASS.
Dataset Splits Yes Human ML3D [8] (HML3D) ... It consists of 23,384 and 4,380 motions for training and testing with a mirror augmentation, respectively. KIT-ML [26] (KIT) ... We obtain 4,888, 300, and 830 motions for training, validation and testing , respectively...
Hardware Specification Yes For the motion-language alignment pre-training, we conduct the experiments on a single NVIDIA A40 GPU.
Software Dependencies No The paper mentions using Vi T-B/16 and Distill-BERT as encoders, and Adam optimizer, but does not provide specific version numbers for these software components or the underlying frameworks/libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The model is trained using Adam optimizer [12] for 50 epochs, with learning rates of 10 5, 10 4 and 10 3 for the text encoder, the motion encoder, and the projection heads, respectively. The embedding dimension after projection is 256 for contrastive learning. λmix and λr are set to 0.5 and 0.1. The temperature coefficients τ and τ are 0.07 and 0.05. The batch size is 128. For the input data, the patch size N is 224. We pad or crop the motions to a fixed length of 224 following Vi T.