reproducibilityindex.ai

Noisy Natural Gradient as Variational Inference

Authors: Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conducted a series of experiments to investigate the following questions: (1) How does noisy natural gradient (NNG) compare with existing methods in terms of prediction performance? (2) Is NNG able to scale to large datasets and modern-size convolutional neural networks? (3) Can NNG achieve better uncertainty estimates? (4) Does it enable more efﬁcient exploration in active learning and reinforcement learning? Our method with a full-covariance multivariate Gaussian, a fully-factorized Gaussian, a matrix-variate Gaussian and block-tridiagonal posterior are denoted as NNG-full, NNGFFG (noise Adam), NNG-MVG (noisy K-FAC) and NNGBlk Tri, respectively. 5.1. Regression We ﬁrst experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). All experiments used networks with one hidden layer unless stated otherwise. We compared our method with Bayes By Backprop (BBB) (Blundell et al., 2015) and probabilistic backpropagation (PBP) with a factorial gaussian posterior (Hern andez-Lobato & Adams, 2015). The results for PBP MV (Sun et al., 2017) and VMG (Louizos & Welling, 2016) can be found in supplement.
Researcher Affiliation	Academia	1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada.
Pseudocode	Yes	Algorithm 1 Noisy Adam. Algorithm 2 Noisy K-FAC.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that its source code is publicly available.
Open Datasets	Yes	We ﬁrst experimented with regression datasets from the UCI collection (Asuncion & Newman, 2007). To evaluate the scalability of our method to large networks, we applied noisy K-FAC to a modiﬁed version of the VGG16 network (Simonyan & Zisserman, 2014) and tested it on CIFAR10 benchmark (Krizhevsky, 2009).
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, and test splits for the main datasets (UCI, CIFAR10), nor does it explicitly cite a standard split protocol for all datasets. While it mentions a specific setup for active learning ('randomly selected 20 labeled training examples and 100 unlabeled examples'), this is not a general train/val/test split for the entire dataset used in experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Require: α: Stepsize Require: β1, β2: Exponential decay rates for updating µ and the Fisher F Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term (Algorithm 1 Noisy Adam) Require: α: stepsize Require: β: exponential moving average parameter Require: λ, η, γex : KL weighting, prior variance, extrinsic damping term Require: stats and inverse update intervals Tstats and Tinv (Algorithm 2 Noisy K-FAC)