Bayesian Online Natural Gradient (BONG)
Authors: Matt Jones, Peter Chang, Kevin P. Murphy
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that our method outperforms other online VB methods in the non-conjugate setting, such as online learning for neural networks, especially when controlling for computational costs. (from Abstract) and This section presents our primary experimental results. These are based on MNIST (D = 784, Ntrain = 60k, Ntest = 10k, C = 10 classes) [Le Cun et al., 2010]. (from Section 5) |
| Researcher Affiliation | Collaboration | Matt Jones University of Colorado mcjones@colorado.edu Peter Chang MIT gyuyoung@mit.edu Kevin Murphy Google Deep Mind kpmurphy@google.com |
| Pseudocode | Yes | A Abstract pseudocode Algorithms 1 to 7 give pseudocode for applying the methods we study. |
| Open Source Code | Yes | Code for our experiments is available at https://github.com/petergchang/bong/. |
| Open Datasets | Yes | These are based on MNIST (D = 784, Ntrain = 60k, Ntest = 10k, C = 10 classes) [Le Cun et al., 2010]. and In addition to MNIST, we report experiments on the SARCOS regression dataset (D = 22, Ntrain = 44,484, Ntest = 4449, C = 1). This data is from https://gaussianprocess.org/gpml/data/. |
| Dataset Splits | No | For methods that require a learning rate (i.e., all methods except BONG), we optimize it wrt mid-way or final performance on a holdout validation set, using Bayesian optimization on NLL. The paper does not specify the size or percentage of this validation set or how it's derived from the main dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU models, memory amounts, or detailed computer specifications) used for running the experiments are provided in the main text. The only mention is in the checklist justification: Experiments all run on any standard GPU/TPU so further details are not necessary. This is not a specific hardware specification. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We apply these to a CNN with two convolutional layers (each with 16 features and a (5,5) kernel), followed by two linear layers (one with 64 features and the final one with 10 features), for a total of 57,722 parameters. and For methods that require a learning rate (i.e., all methods except BONG), we optimize it wrt mid-way or final performance on a holdout validation set, using Bayesian optimization on NLL. and All methods require specifying the prior belief state, p(θ0) = N(µ0, Σ0 = σ2 0I). |