Accurate Uncertainties for Deep Learning Using Calibrated Regression

Authors: Volodymyr Kuleshov, Nathan Fenner, Stefano Ermon

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed algorithm on a range of Bayesian models, including Bayesian linear regression as well as feedforward and recurrent Bayesian neural networks. Our method consistently produces well-calibrated confidence estimates, which are in turn useful for several tasks in time series forecasting and model-based reinforcement learning.
Researcher Affiliation Collaboration 1Stanford University, Stanford, California 2Afresh Technologies, San Francisco, California.
Pseudocode Yes Algorithm 1 Recalibration of Regression Models. Input: Uncalibrated model H : X (Y [0, 1]) and calibration set S = {(xt, yt)}T t=1. Output: Auxiliary recalibration model R : [0, 1] [0, 1].
Open Source Code No The paper does not provide any specific links or explicit statements about the public release of the source code for the described methodology.
Open Datasets Yes Datasets. We use eight UCI datasets varying in size from 194 to 8192 examples; examples carry between 6 and 159 continuous features.
Dataset Splits No There is generally no standard train/test split, hence we randomly assign 25% of each dataset for testing, and use the rest for training.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU/GPU models, cloud instance types) used for running experiments.
Software Dependencies No The paper mentions various models and techniques (e.g., Bayesian Ridge Regression, dropout, Concrete dropout, isotonic regression, GRU, Dense Net) but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes In UCI experiments, the feedforward neural network has two layers of 128 hidden units with a dropout rate of 0.5 and parametric Re LU non-linearities. Recurrent networks are based on a standard GRU architecture with two stacked layers and a recurrent dropout of 0.5 (Gal and Ghahramani, 2016b).