Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Authors: Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, Balaji Lakshminarayanan
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3 |
| Researcher Affiliation | Collaboration | Jeremiah Zhe Liu Google Research & Harvard University EMAIL |
| Pseudocode | Yes | Algorithm 1 SNGP Training; Algorithm 2 SNGP Prediction |
| Open Source Code | Yes | Code available at https://github.com/google/uncertainty-baselines/tree/master/baselines. |
| Open Datasets | Yes | On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3 |
| Dataset Splits | No | The paper mentions training data and test data, but does not explicitly specify train/validation/test splits by percentage, count, or a clear reference to predefined splits within the provided text. It refers to Appendix C for 'full experimental details', which is not included in the provided text. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Wide Res Net, BERTbase, and the uncertainty baselines framework, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For all models that use GP layer, we keep DL = 1024 and compute predictive distribution by performing Monte Carlo averaging with 10 samples. We evaluate SNGP on a Wide Res Net 28-10 [83] for image classification, and BERTbase [18] for language understanding. We compare against a deterministic baseline and two ensemble approaches: MC Dropout (with 10 dropout samples) and deep ensembles (with 10 models), all trained with a dense output layer and no spectral regularization. |