reproducibilityindex.ai

Reliable Decisions with Threshold Calibration

Authors: Roshni Sahoo, Shengjia Zhao, Alyssa Chen, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, threshold calibration improves decision loss prediction without compromising on the quality of the decisions in two real-world settings: hospital scheduling decisions and resource allocation decisions.
Researcher Affiliation	Academia	Roshni Sahoo Stanford University rsahoo@stanford.edu Shengjia Zhao Stanford University sjzhao@stanford.edu Alyssa Chen UTSW Medical Center alyssa.chen@utsw.edu Stefano Ermon Stanford University ermon@stanford.edu
Pseudocode	Yes	Algorithm 1: Threshold Recalibration
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] and Did you include any new assets either in the supplemental material or as a URL? [Yes] See Appendix A.
Open Datasets	Yes	MIMIC-III. Patient length-of-stay predictions are used for hospital scheduling and resource management [17]. We consider a patient length-of-stay forecaster trained on patient admission laboratory values from the MIMIC-III dataset [20].
Dataset Splits	Yes	We use a train/validation/test split. The uncalibrated forecaster is a neural network trained on the training set with the validation set used for early stopping.
Hardware Specification	Yes	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A.
Software Dependencies	No	The paper mentions using a 'neural network' and 'Isotonic regression' but does not specify version numbers for any software libraries or frameworks. The details are deferred to Appendix A, which is not provided.
Experiment Setup	No	The 'Experimental Setup' section describes the data splits and general training process (e.g., 'validation set used for early stopping'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text and are likely deferred to Appendix A.