Reliable Decisions with Threshold Calibration
Authors: Roshni Sahoo, Shengjia Zhao, Alyssa Chen, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, threshold calibration improves decision loss prediction without compromising on the quality of the decisions in two real-world settings: hospital scheduling decisions and resource allocation decisions. |
| Researcher Affiliation | Academia | Roshni Sahoo Stanford University rsahoo@stanford.edu Shengjia Zhao Stanford University sjzhao@stanford.edu Alyssa Chen UTSW Medical Center alyssa.chen@utsw.edu Stefano Ermon Stanford University ermon@stanford.edu |
| Pseudocode | Yes | Algorithm 1: Threshold Recalibration |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] and Did you include any new assets either in the supplemental material or as a URL? [Yes] See Appendix A. |
| Open Datasets | Yes | MIMIC-III. Patient length-of-stay predictions are used for hospital scheduling and resource management [17]. We consider a patient length-of-stay forecaster trained on patient admission laboratory values from the MIMIC-III dataset [20]. |
| Dataset Splits | Yes | We use a train/validation/test split. The uncalibrated forecaster is a neural network trained on the training set with the validation set used for early stopping. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A. |
| Software Dependencies | No | The paper mentions using a 'neural network' and 'Isotonic regression' but does not specify version numbers for any software libraries or frameworks. The details are deferred to Appendix A, which is not provided. |
| Experiment Setup | No | The 'Experimental Setup' section describes the data splits and general training process (e.g., 'validation set used for early stopping'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text and are likely deferred to Appendix A. |