Bayesian Online Natural Gradient (BONG)

Authors: Matt Jones, Peter Chang, Kevin P. Murphy

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that our method outperforms other online VB methods in the non-conjugate setting, such as online learning for neural networks, especially when controlling for computational costs. (from Abstract) and This section presents our primary experimental results. These are based on MNIST (D = 784, Ntrain = 60k, Ntest = 10k, C = 10 classes) [Le Cun et al., 2010]. (from Section 5)
Researcher Affiliation Collaboration Matt Jones University of Colorado mcjones@colorado.edu Peter Chang MIT gyuyoung@mit.edu Kevin Murphy Google Deep Mind kpmurphy@google.com
Pseudocode Yes A Abstract pseudocode Algorithms 1 to 7 give pseudocode for applying the methods we study.
Open Source Code Yes Code for our experiments is available at https://github.com/petergchang/bong/.
Open Datasets Yes These are based on MNIST (D = 784, Ntrain = 60k, Ntest = 10k, C = 10 classes) [Le Cun et al., 2010]. and In addition to MNIST, we report experiments on the SARCOS regression dataset (D = 22, Ntrain = 44,484, Ntest = 4449, C = 1). This data is from https://gaussianprocess.org/gpml/data/.
Dataset Splits No For methods that require a learning rate (i.e., all methods except BONG), we optimize it wrt mid-way or final performance on a holdout validation set, using Bayesian optimization on NLL. The paper does not specify the size or percentage of this validation set or how it's derived from the main dataset.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU models, memory amounts, or detailed computer specifications) used for running the experiments are provided in the main text. The only mention is in the checklist justification: Experiments all run on any standard GPU/TPU so further details are not necessary. This is not a specific hardware specification.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes We apply these to a CNN with two convolutional layers (each with 16 features and a (5,5) kernel), followed by two linear layers (one with 64 features and the final one with 10 features), for a total of 57,722 parameters. and For methods that require a learning rate (i.e., all methods except BONG), we optimize it wrt mid-way or final performance on a holdout validation set, using Bayesian optimization on NLL. and All methods require specifying the prior belief state, p(θ0) = N(µ0, Σ0 = σ2 0I).