Noisy Natural Gradient as Variational Inference

Authors: Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conducted a series of experiments to investigate the following questions: (1) How does noisy natural gradient (NNG) compare with existing methods in terms of prediction performance? (2) Is NNG able to scale to large datasets and modern-size convolutional neural networks? (3) Can NNG achieve better uncertainty estimates? (4) Does it enable more efficient exploration in active learning and reinforcement learning? Our method with a full-covariance multivariate Gaussian, a fully-factorized Gaussian, a matrix-variate Gaussian and block-tridiagonal posterior are denoted as NNG-full, NNGFFG (noise Adam), NNG-MVG (noisy K-FAC) and NNGBlk Tri, respectively. 5.1. Regression We first experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). All experiments used networks with one hidden layer unless stated otherwise. We compared our method with Bayes By Backprop (BBB) (Blundell et al., 2015) and probabilistic backpropagation (PBP) with a factorial gaussian posterior (Hern andez-Lobato & Adams, 2015). The results for PBP MV (Sun et al., 2017) and VMG (Louizos & Welling, 2016) can be found in supplement.
Researcher Affiliation Academia 1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada.
Pseudocode Yes Algorithm 1 Noisy Adam. Algorithm 2 Noisy K-FAC.
Open Source Code No The paper does not contain any explicit statements or links indicating that its source code is publicly available.
Open Datasets Yes We first experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). To evaluate the scalability of our method to large networks, we applied noisy K-FAC to a modified version of the VGG16 network (Simonyan & Zisserman, 2014) and tested it on CIFAR10 benchmark (Krizhevsky, 2009).
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, and test splits for the main datasets (UCI, CIFAR10), nor does it explicitly cite a standard split protocol for all datasets. While it mentions a specific setup for active learning ('randomly selected 20 labeled training examples and 100 unlabeled examples'), this is not a general train/val/test split for the entire dataset used in experiments.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes Require: α: Stepsize Require: β1, β2: Exponential decay rates for updating µ and the Fisher F Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term (Algorithm 1 Noisy Adam) Require: α: stepsize Require: β: exponential moving average parameter Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term Require: stats and inverse update intervals Tstats and Tinv (Algorithm 2 Noisy K-FAC)