Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty

Authors: Harry Zhang, Luca Carlone

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a quantitative evaluation of our model against state-of-the-art baselines. We also provide ablation studies and qualitative results to support our design choices. We follow previous baselines (Shen et al., 2023; Kanazawa et al., 2018; 2019; Dwivedi et al., 2024; Choi et al., 2021) and report several intra-frame metrics, including Mean Per Joint Position Error (MPJPE), Procrustes-aligned MPJPE (PA-MPJPE), and Mean Per Vertex Position Error (MPVPE).
Researcher Affiliation	Academia	1Massachusetts Institute of Technology. Correspondence to: Harry Zhang <EMAIL>.
Pseudocode	No	The paper describes methods and processes but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	3D visualization, code, and data will be available at this website.
Open Datasets	Yes	We follow the same dataset split and setup as done in previous works and evaluated on 3DPW (Von Marcard et al., 2018), Human3.6M (Ionescu et al., 2013), and MPII3DHP (Mehta et al., 2017).
Dataset Splits	Yes	We follow the same dataset split and setup as done in previous works and evaluated on 3DPW (Von Marcard et al., 2018), Human3.6M (Ionescu et al., 2013), and MPII3DHP (Mehta et al., 2017). More details on the construction of training dataset are in Appendix D. Note that our training dataset is about 2.5% smaller than previous works because we hold out a small portion (~1500 datapoints) for calibration.
Hardware Specification	Yes	Our model is trained using an NVIDIA V100 GPU, where training consumes an amortized GPU memory of 20GB, and CPU memory of 160 GB.
Software Dependencies	No	The paper mentions an Adam optimizer and a cosine scheduler but does not specify versions for any software libraries or programming languages used.
Experiment Setup	Yes	We use an Adam optimizer with a weight decay of 0.1 and a momentum of 0.9. The adversarial loss weight is 0.6 and is optimized every 100 iterations. Our model is trained using an NVIDIA V100 GPU, where training consumes an amortized GPU memory of 20GB, and CPU memory of 160 GB. We train the model for 100 epochs with an initial learning rate of 5e-5 with a cosine scheduler. The ensemble augmentation step produces 20 samples for the same input datapoint.