Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

Authors: Shengjia Zhao, Michael Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our recalibration algorithm empirically: compared to existing methods, decision calibration improves decision-making on skin lesion and Image Net classification with modern neural network predictors.
Researcher Affiliation Academia Shengjia Zhao Stanford University EMAIL Michael P. Kim UC Berkeley EMAIL Roshni Sahoo Stanford University EMAIL Tengyu Ma Stanford University EMAIL Stefano Ermon Stanford University EMAIL
Pseudocode Yes Algorithm 1: Recalibration algorithm to achieve LK decision calibration.
Open Source Code No The paper does not provide any specific statements or links regarding the release of open-source code for the described methodology.
Open Datasets Yes We use the HAM10000 dataset (Tschandl et al., 2018).
Dataset Splits Yes We partition the dataset into train/validation/test sets, where approximately 15% of the data are used for validation, while 10% are used for the test set.
Hardware Specification No The paper does not specify the hardware used for running experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper mentions 'pytorch' but does not specify version numbers for any software dependencies.
Experiment Setup Yes For modeling we use the densenet-121 architecture (Huang et al., 2017), which achieves around 90% accuracy. ... For these experiments we set the number of actions K = 3.