Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Generalizable 3D Human Pose Estimation via Ensembles on Flat Loss Landscapes

Authors: Jumin Han, Jun-Hui Kim, Seong-Whan Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our approach improves the generalization capability of 3D HPE models, and can be easily applied, regardless of model architecture, with consistent performance gains. Our method enhances performances of the model for the representative model architectures (MLP, CNN, GCN, and Transformer) of 3D HPE in benchmark datasets such as Human3.6M [14], MPI-INF-3DHP [24], 3DPW [30], and BEDLAM [2].
Researcher Affiliation	Academia	Jumin Han Department of Artificial Intelligence Korea University, Seoul, South Korea EMAIL Jun-Hee Kim Department of Artificial Intelligence Korea University, Seoul, South Korea EMAIL Seong-Whan Lee Department of Artificial Intelligence Korea University, Seoul, South Korea EMAIL
Pseudocode	No	The paper describes methods verbally and with mathematical equations (Eq. 1, 2, 3, 4) and diagrams (Figure 4), but does not present structured pseudocode or algorithm blocks.
Open Source Code	Yes	Answer: [Yes] Justification: The code is included in the supplementary material.
Open Datasets	Yes	Our method enhances performances of the model for the representative model architectures (MLP, CNN, GCN, and Transformer) of 3D HPE in benchmark datasets such as Human3.6M [14], MPI-INF-3DHP [24], 3DPW [30], and BEDLAM [2].
Dataset Splits	Yes	We utilize the data from subjects 1, 5, 6, 7, and 8 as training set, while the data from subjects 9 and 11 are utilized as test set following the literature of 3D HPE.
Hardware Specification	Yes	Answer: [Yes] Justification: We explain the GPU we used in the supplementary material.
Software Dependencies	No	The main text of the paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup	Yes	The results show that applying SAM to a 3D HPE model yielded no performance gain, which confirmed our hypothesis. Note that the models are trained for a longer duration because of the slow convergence of SAM and the perturbation radius is set as 0.05 for SAM training. ... To illustrate this, we compare the training loss trajectories of the model with and without adaptive scaling mechanism during 20 epochs on H36M [14] in Figure 7.