reproducibilityindex.ai

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Authors: Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, Balaji Lakshminarayanan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Researcher Affiliation	Collaboration	Jeremiah Zhe Liu Google Research & Harvard University jereliu@google.com
Pseudocode	Yes	Algorithm 1 SNGP Training; Algorithm 2 SNGP Prediction
Open Source Code	Yes	Code available at https://github.com/google/uncertainty-baselines/tree/master/baselines.
Open Datasets	Yes	On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Dataset Splits	No	The paper mentions training data and test data, but does not explicitly specify train/validation/test splits by percentage, count, or a clear reference to predefined splits within the provided text. It refers to Appendix C for 'full experimental details', which is not included in the provided text.
Hardware Specification	No	The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Wide Res Net, BERTbase, and the uncertainty baselines framework, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For all models that use GP layer, we keep DL = 1024 and compute predictive distribution by performing Monte Carlo averaging with 10 samples. We evaluate SNGP on a Wide Res Net 28-10 [83] for image classiﬁcation, and BERTbase [18] for language understanding. We compare against a deterministic baseline and two ensemble approaches: MC Dropout (with 10 dropout samples) and deep ensembles (with 10 models), all trained with a dense output layer and no spectral regularization.