Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty

Authors: Harry Zhang, Luca Carlone

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a quantitative evaluation of our model against state-of-the-art baselines. We also provide ablation studies and qualitative results to support our design choices. We follow previous baselines (Shen et al., 2023; Kanazawa et al., 2018; 2019; Dwivedi et al., 2024; Choi et al., 2021) and report several intra-frame metrics, including Mean Per Joint Position Error (MPJPE), Procrustes-aligned MPJPE (PA-MPJPE), and Mean Per Vertex Position Error (MPVPE).
Researcher Affiliation Academia 1Massachusetts Institute of Technology. Correspondence to: Harry Zhang <EMAIL>.
Pseudocode No The paper describes methods and processes but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No 3D visualization, code, and data will be available at this website.
Open Datasets Yes We follow the same dataset split and setup as done in previous works and evaluated on 3DPW (Von Marcard et al., 2018), Human3.6M (Ionescu et al., 2013), and MPII3DHP (Mehta et al., 2017).
Dataset Splits Yes We follow the same dataset split and setup as done in previous works and evaluated on 3DPW (Von Marcard et al., 2018), Human3.6M (Ionescu et al., 2013), and MPII3DHP (Mehta et al., 2017). More details on the construction of training dataset are in Appendix D. Note that our training dataset is about 2.5% smaller than previous works because we hold out a small portion (~1500 datapoints) for calibration.
Hardware Specification Yes Our model is trained using an NVIDIA V100 GPU, where training consumes an amortized GPU memory of 20GB, and CPU memory of 160 GB.
Software Dependencies No The paper mentions an Adam optimizer and a cosine scheduler but does not specify versions for any software libraries or programming languages used.
Experiment Setup Yes We use an Adam optimizer with a weight decay of 0.1 and a momentum of 0.9. The adversarial loss weight is 0.6 and is optimized every 100 iterations. Our model is trained using an NVIDIA V100 GPU, where training consumes an amortized GPU memory of 20GB, and CPU memory of 160 GB. We train the model for 100 epochs with an initial learning rate of 5e-5 with a cosine scheduler. The ensemble augmentation step produces 20 samples for the same input datapoint.