Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Authors: Michael Dusenberry, Ghassen Jerfel, Yeming Wen, Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For Res Net-50 on Image Net, Wide Res Net 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and outof-distribution variants. |
| Researcher Affiliation | Collaboration | 1Google Brain, Mountain View, USA 2Duke University, Durham, USA 3University of Toronto, Toronto, CA 4University of California, San Diego, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1 Code: https://github.com/google/edward2. |
| Open Datasets | Yes | For Res Net-50 on Image Net, Wide Res Net 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and outof-distribution variants. MIMIC-III (Johnson et al., 2016). |
| Dataset Splits | No | The paper mentions 'Validation Test Method' for MIMIC-III but does not provide specific details on the dataset splits (percentages, counts, or methodology) for training, validation, or testing across any of the datasets used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper mentions 'Edward2' as a code reference, but does not provide specific version numbers for Edward2 or any other software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | For each, we tune over the total number of training epochs, and measure NLL, accuracy, and ECE on both the test set and CIFAR-10-C corruptions dataset. As the number of mixture components increases from 1 to 8, the performance across all metrics increases. At K = 16, however, there is a decline in performance. Based on our ๏ฌndings, all experiments in Section 4 use K = 4. |