Noisy Natural Gradient as Variational Inference
Authors: Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conducted a series of experiments to investigate the following questions: (1) How does noisy natural gradient (NNG) compare with existing methods in terms of prediction performance? (2) Is NNG able to scale to large datasets and modern-size convolutional neural networks? (3) Can NNG achieve better uncertainty estimates? (4) Does it enable more efficient exploration in active learning and reinforcement learning? Our method with a full-covariance multivariate Gaussian, a fully-factorized Gaussian, a matrix-variate Gaussian and block-tridiagonal posterior are denoted as NNG-full, NNGFFG (noise Adam), NNG-MVG (noisy K-FAC) and NNGBlk Tri, respectively. 5.1. Regression We first experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). All experiments used networks with one hidden layer unless stated otherwise. We compared our method with Bayes By Backprop (BBB) (Blundell et al., 2015) and probabilistic backpropagation (PBP) with a factorial gaussian posterior (Hern andez-Lobato & Adams, 2015). The results for PBP MV (Sun et al., 2017) and VMG (Louizos & Welling, 2016) can be found in supplement. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada. |
| Pseudocode | Yes | Algorithm 1 Noisy Adam. Algorithm 2 Noisy K-FAC. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that its source code is publicly available. |
| Open Datasets | Yes | We first experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). To evaluate the scalability of our method to large networks, we applied noisy K-FAC to a modified version of the VGG16 network (Simonyan & Zisserman, 2014) and tested it on CIFAR10 benchmark (Krizhevsky, 2009). |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, and test splits for the main datasets (UCI, CIFAR10), nor does it explicitly cite a standard split protocol for all datasets. While it mentions a specific setup for active learning ('randomly selected 20 labeled training examples and 100 unlabeled examples'), this is not a general train/val/test split for the entire dataset used in experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Require: α: Stepsize Require: β1, β2: Exponential decay rates for updating µ and the Fisher F Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term (Algorithm 1 Noisy Adam) Require: α: stepsize Require: β: exponential moving average parameter Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term Require: stats and inverse update intervals Tstats and Tinv (Algorithm 2 Noisy K-FAC) |