Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Authors: Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, Balaji Lakshminarayanan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Researcher Affiliation Collaboration Jeremiah Zhe Liu Google Research & Harvard University jereliu@google.com
Pseudocode Yes Algorithm 1 SNGP Training; Algorithm 2 SNGP Prediction
Open Source Code Yes Code available at https://github.com/google/uncertainty-baselines/tree/master/baselines.
Open Datasets Yes On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Dataset Splits No The paper mentions training data and test data, but does not explicitly specify train/validation/test splits by percentage, count, or a clear reference to predefined splits within the provided text. It refers to Appendix C for 'full experimental details', which is not included in the provided text.
Hardware Specification No The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using Wide Res Net, BERTbase, and the uncertainty baselines framework, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For all models that use GP layer, we keep DL = 1024 and compute predictive distribution by performing Monte Carlo averaging with 10 samples. We evaluate SNGP on a Wide Res Net 28-10 [83] for image classification, and BERTbase [18] for language understanding. We compare against a deterministic baseline and two ensemble approaches: MC Dropout (with 10 dropout samples) and deep ensembles (with 10 models), all trained with a dense output layer and no spectral regularization.