Reliable Decisions with Threshold Calibration

Authors: Roshni Sahoo, Shengjia Zhao, Alyssa Chen, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, threshold calibration improves decision loss prediction without compromising on the quality of the decisions in two real-world settings: hospital scheduling decisions and resource allocation decisions.
Researcher Affiliation Academia Roshni Sahoo Stanford University rsahoo@stanford.edu Shengjia Zhao Stanford University sjzhao@stanford.edu Alyssa Chen UTSW Medical Center alyssa.chen@utsw.edu Stefano Ermon Stanford University ermon@stanford.edu
Pseudocode Yes Algorithm 1: Threshold Recalibration
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] and Did you include any new assets either in the supplemental material or as a URL? [Yes] See Appendix A.
Open Datasets Yes MIMIC-III. Patient length-of-stay predictions are used for hospital scheduling and resource management [17]. We consider a patient length-of-stay forecaster trained on patient admission laboratory values from the MIMIC-III dataset [20].
Dataset Splits Yes We use a train/validation/test split. The uncalibrated forecaster is a neural network trained on the training set with the validation set used for early stopping.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A.
Software Dependencies No The paper mentions using a 'neural network' and 'Isotonic regression' but does not specify version numbers for any software libraries or frameworks. The details are deferred to Appendix A, which is not provided.
Experiment Setup No The 'Experimental Setup' section describes the data splits and general training process (e.g., 'validation set used for early stopping'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text and are likely deferred to Appendix A.