Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Authors: Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, Balaji Lakshminarayanan

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Researcher Affiliation Collaboration Jeremiah Zhe Liu Google Research & Harvard University EMAIL
Pseudocode Yes Algorithm 1 SNGP Training; Algorithm 2 SNGP Prediction
Open Source Code Yes Code available at https://github.com/google/uncertainty-baselines/tree/master/baselines.
Open Datasets Yes On a suite of vision and language understanding tasks and on modern architectures (Wide-Res Net and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.3
Dataset Splits No The paper mentions training data and test data, but does not explicitly specify train/validation/test splits by percentage, count, or a clear reference to predefined splits within the provided text. It refers to Appendix C for 'full experimental details', which is not included in the provided text.
Hardware Specification No The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using Wide Res Net, BERTbase, and the uncertainty baselines framework, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For all models that use GP layer, we keep DL = 1024 and compute predictive distribution by performing Monte Carlo averaging with 10 samples. We evaluate SNGP on a Wide Res Net 28-10 [83] for image classification, and BERTbase [18] for language understanding. We compare against a deterministic baseline and two ensemble approaches: MC Dropout (with 10 dropout samples) and deep ensembles (with 10 models), all trained with a dense output layer and no spectral regularization.