Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Puppeteer: Rig and Animate Your 3D Models

Authors: Chaoyue Song, Xiu Li, Fan Yang, Zhongcong XU, Jiacheng Wei, Fayao Liu, Jiashi Feng, Guosheng Lin, Jianfeng Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations across multiple benchmarks demonstrate that our method significantly outperforms state-of-the-art techniques in both skeletal prediction accuracy and skinning quality. The system robustly processes diverse 3D content, ranging from professionally designed game assets to AI-generated shapes, producing temporally coherent animations that eliminate the jittering issues common in existing methods. Extensive evaluations demonstrate the effectiveness of our approach across both rigging and animation tasks. For rigging, experiments on the expanded Articulation-XL2.0 dataset and Models Resource benchmark [76, 85] show significant improvements over state-of-the-art methods in skeleton accuracy and skinning weight quality.
Researcher Affiliation	Collaboration	Chaoyue Song1,2, Xiu Li2, Fan Yang1, Zhongcong Xu2, Jiacheng Wei1, Fayao Liu3, Jiashi Feng2, Guosheng Lin 1, Jianfeng Zhang 2 1Nanyang Technological University 2Byte Dance Seed 3Institute for Infocomm Research, A*STAR
Pseudocode	No	The paper only describes methods in paragraph form. There are no explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	Answer: [Yes] Justification: We will open source the data and codes upon acceptance.
Open Datasets	Yes	We present Articulation-XL2.0, an expanded version of Articulation-XL proposed in [67]... We have released Articulation XL2.0, a comprehensive collection of 59.4k high-quality rigged models, to facilitate future research. Dataset statistics and examples are provided in the appendix. Also: Models Resource benchmark [76, 85]
Dataset Splits	Yes	For model training, we utilize over 46k samples from the main subset and 10.9k from the diverse-pose subset. For evaluation, we employ three distinct test sets: Articulation-XL2.0-test (2k data from the main set), Models Resource-test [76, 86] (270 upright, front-facing models with no overlap with Articulation-XL2.0, enabling assessment of cross-dataset generalization), and a 500-mesh portion of the diverse-pose subset specifically selected to evaluate model performance under varied poses.
Hardware Specification	Yes	In the appendix, we specify that our skeleton generation model was trained on 8 NVIDIA A100 GPUs for approximately 3 days and 20 hours, the skinning weight prediction model required approximately 1 day and 6 hours on the same hardware, and the animation optimization process completes in approximately 20 minutes on a single A100 GPU.
Software Dependencies	No	We utilize differentiable rendering via Pytorch3D [57]... The tracking losses incorporate a 2D joint tracking term and a 2D vertex tracking term that leverage Cotracker3 [35]... We employ the ray_mesh_intersect function from libigl [29]... Kling AI [1] or Ji Meng AI [34]... The Geodesic Voxel Binding (GVB) [18] comparison utilizes the implementation in Autodesk Maya [27]. No specific version numbers for these software components are provided.
Experiment Setup	Yes	To enhance robustness and generalization capabilities, we apply geometric data augmentations (scaling, shifting, rotation transformations) and pose augmentation articulating the training samples with their ground truth skeleton and skinning weights to simulate diverse poses. Further implementation details are provided in the appendix... During training, we apply pose augmentation with a probability of 0.5. When pose augmentation is applied, each joint has a 0.3 probability of rotation, with rotation angles constrained to the range of [ 60 , 60 ]. Sequence ordering randomization is annealed following [92], with a permutation probability r that starts at 1 and falls to 0 (reverting to hierarchical order) over training... The auto-regressive transformer is trained on 8 NVIDIA A100 GPUs (batch size 64 per GPU, effective batch 512) for approximately 3 days and 20 hours... Training is performed on Articulation-XL2.0 with 8 NVIDIA A100 GPUs for roughly 1 day and 6 hours, with a batch size of 16 per GPU... regularization losses are down-weighted by 3 4 orders of magnitude relative to rendering and tracking losses to prevent over-smoothing.