Bayesian Layers: A Module for Neural Network Uncertainty

Authors: Dustin Tran, Mike Dusenberry, Mark van der Wilk, Danijar Hafner

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As demonstration, we fit a 5-billion parameter Bayesian Transformer on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning.
Researcher Affiliation Industry Dustin Tran Google Brain Michael W. Dusenberry Google Brain Mark van der Wilk Prowler.io Danijar Hafner Google Brain
Pseudocode No The paper includes code snippets in various figures (e.g., Figure 1, 3, 4, 5, 6, 7, 8) but no structured pseudocode or algorithm blocks.
Open Source Code Yes All code is available at https://github.com/google/edward2 as part of the edward2 namespace.
Open Datasets Yes We implemented a Bayesian Transformer for the WMT14 EN-FR translation task.
Dataset Splits No The paper references datasets (WMT14 EN-FR translation task, 'cheetah task' for RL) but does not provide specific training/test/validation split percentages or sample counts.
Hardware Specification Yes As demonstration, we fit a 5-billion parameter Bayesian Transformer on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning.
Software Dependencies Yes Code snippets assume import edward2 as ed; import tensorflow as tf; tensorflow==2.0.0.
Experiment Setup No The paper mentions training times and memory usage for models (e.g., 'Training time for the deterministic Transformer takes roughly 13 hours; the Bayesian Transformer takes 16 hours and 2 extra gb per TPU.'), but it does not provide specific hyperparameters such as learning rate, batch size, or optimizer settings for reproduction.