Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PyraMotion: Attentional Pyramid-Structured Motion Integration for Co-Speech 3D Gesture Synthesis

Authors: Zhizhuo Yin, Yuk Hang Tsui, Pan Hui

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Objective and subjective experiments demonstrate that the Pyra Motion outperforms state-of-the-art methods in terms of generating natural and expressive full-body human gestures. Extensive ablation experiments highlight that the self-adaptiveness integration through attention maps contributes to performance.
Researcher Affiliation Academia 1Hong Kong University of Science and Technology (Guangzhou) Guangzhou, Guangdong, China 2Hong Kong University of Science and Technology, Hong Kong SAR EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using mathematical equations and textual descriptions, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The source code will be released to Git Hub after acceptance.
Open Datasets Yes We evaluate the ability of our method to generate holistic 3D gestures from speech on a diverse and expressive dataset BEAT21 [23] collected from mocap equipment. This public dataset contains 76 hours of high-quality, multi-modal data captured from 30 speakers talking with eight different emotions.
Dataset Splits Yes Following the settings of existing work [22, 25], we conduct the experiments on the BEAT2-Standard Speaker2 with an 85%/7.5%/7.5% train/val/test split.
Hardware Specification Yes The whole training process is conducted on an Ubuntu Server with 1 GPU computing card with 32 GB VRAM and 256 GB memory.
Software Dependencies Yes For the software environment, the model is deployed using Python 3.9, Py Torch 2.4.1.
Experiment Setup Yes The weight of commitment [34] loss is set to 1. [...] Adhering to the feature pyramid framework [20], which leverages exponentially increasing kernel sizes to reduce information overlap across layers, we configure successive layers with the scale sequence [1, 2, 4, 8, 16]. [...] In this experiment, we search for the optimal number of layers to balance computational efficiency with hierarchical feature extraction. The results, summarized in Table 5, show that increasing the layer number initially boosts performance but eventually leads to a decline.