reproducibilityindex.ai

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Authors: Michael Dusenberry, Ghassen Jerfel, Yeming Wen, Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For Res Net-50 on Image Net, Wide Res Net 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and outof-distribution variants.
Researcher Affiliation	Collaboration	1Google Brain, Mountain View, USA 2Duke University, Durham, USA 3University of Toronto, Toronto, CA 4University of California, San Diego, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1 Code: https://github.com/google/edward2.
Open Datasets	Yes	For Res Net-50 on Image Net, Wide Res Net 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and outof-distribution variants. MIMIC-III (Johnson et al., 2016).
Dataset Splits	No	The paper mentions 'Validation Test Method' for MIMIC-III but does not provide specific details on the dataset splits (percentages, counts, or methodology) for training, validation, or testing across any of the datasets used.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions 'Edward2' as a code reference, but does not provide specific version numbers for Edward2 or any other software dependencies, libraries, or programming languages used.
Experiment Setup	Yes	For each, we tune over the total number of training epochs, and measure NLL, accuracy, and ECE on both the test set and CIFAR-10-C corruptions dataset. As the number of mixture components increases from 1 to 8, the performance across all metrics increases. At K = 16, however, there is a decline in performance. Based on our ﬁndings, all experiments in Section 4 use K = 4.