Bayesian Layers: A Module for Neural Network Uncertainty
Authors: Dustin Tran, Mike Dusenberry, Mark van der Wilk, Danijar Hafner
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As demonstration, we fit a 5-billion parameter Bayesian Transformer on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. |
| Researcher Affiliation | Industry | Dustin Tran Google Brain Michael W. Dusenberry Google Brain Mark van der Wilk Prowler.io Danijar Hafner Google Brain |
| Pseudocode | No | The paper includes code snippets in various figures (e.g., Figure 1, 3, 4, 5, 6, 7, 8) but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code is available at https://github.com/google/edward2 as part of the edward2 namespace. |
| Open Datasets | Yes | We implemented a Bayesian Transformer for the WMT14 EN-FR translation task. |
| Dataset Splits | No | The paper references datasets (WMT14 EN-FR translation task, 'cheetah task' for RL) but does not provide specific training/test/validation split percentages or sample counts. |
| Hardware Specification | Yes | As demonstration, we fit a 5-billion parameter Bayesian Transformer on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. |
| Software Dependencies | Yes | Code snippets assume import edward2 as ed; import tensorflow as tf; tensorflow==2.0.0. |
| Experiment Setup | No | The paper mentions training times and memory usage for models (e.g., 'Training time for the deterministic Transformer takes roughly 13 hours; the Bayesian Transformer takes 16 hours and 2 extra gb per TPU.'), but it does not provide specific hyperparameters such as learning rate, batch size, or optimizer settings for reproduction. |