Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Authors: Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, Balaji Lakshminarayanan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3 |
| Researcher Affiliation | Collaboration | Jeremiah Zhe Liu Google Research & Harvard University jereliu@google.com |
| Pseudocode | Yes | Algorithm 1 SNGP Training; Algorithm 2 SNGP Prediction |
| Open Source Code | Yes | Code available at https://github.com/google/uncertainty-baselines/tree/master/baselines. |
| Open Datasets | Yes | On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3 |
| Dataset Splits | No | The paper mentions training data and test data, but does not explicitly specify train/validation/test splits by percentage, count, or a clear reference to predefined splits within the provided text. It refers to Appendix C for 'full experimental details', which is not included in the provided text. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Wide Res Net, BERTbase, and the uncertainty baselines framework, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For all models that use GP layer, we keep DL = 1024 and compute predictive distribution by performing Monte Carlo averaging with 10 samples. We evaluate SNGP on a Wide Res Net 28-10 [83] for image classification, and BERTbase [18] for language understanding. We compare against a deterministic baseline and two ensemble approaches: MC Dropout (with 10 dropout samples) and deep ensembles (with 10 models), all trained with a dense output layer and no spectral regularization. |