Handling the Positive-Definite Constraint in the Bayesian Learning Rule

Authors: Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms existing methods without any significant increase in computation. Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces. and 6. Numerical Results Our implementation: github.com/yorkerlin/i Bayes LRule
Researcher Affiliation Academia 1University of British Columbia, Vancouver, Canada. 2CIFAR AI Chair, Alberta Machine Intelligence Institute, Canada. 3RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. Correspondence to: Wu Lin <wlin2018@cs.ubc.ca>.
Pseudocode Yes Figure 1. Our improved Bayesian learning rule solves an implementation issue with an existing algorithm known as VOGN (Khan et al., 2018) (shown in the left). VOGN is an Adam-like optimizer which gives state-of-the-art results on large deep learning problems (Osawa et al., 2019). However, it requires us to store individual gradients in a minibatch which makes the algorithm slow (shown with blue in line 3 and 6). This is necessary for the scaling vector ˆs to obtain a good estimate of uncertainty. Our work in this paper fixes this issue using the improved Bayesian learning rule. Our Adam-like optimizer (shown in the right) only requires average over the minibatch (see line 3). Line 6 is simply changed to use the re-parametrization trick with the averaged gradient. The additional terms added to the Bayesian learning rule is shown in red in line 8. These changes do not increase the computation cost significantly while fixing the implementation issue of VOGN. Due to our modification, the scaling vector ˆs always remains positive. A small difference is that the mean µ is updated before in our optimizer (see line 7 and 8), while in VOGN it is the opposite. The difference shows that NGD depends on parameterization.
Open Source Code Yes Our implementation: github.com/yorkerlin/i Bayes LRule
Open Datasets Yes We first visualize Gaussian approximations with full covariance structures for the Bayesian Logistic regression example taken from Murphy (2013) (N = 60, d = 2). and Results on Real Data: Abalone dataset (N = 4, 177, d = 8), Ionosphere dataset (N = 351, d = 34), CyTOF dataset (N = 522, 656, d = 40), CIFAR-10 dataset (N = 60, 000, d = 3 32 32).
Dataset Splits Yes We train the network with diagonal Gaussian approximations on the CIFAR-10 dataset (N = 60, 000, d = 3 32 32) with 50,000 images for training and 10,000 images for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or other detailed computer specifications used for running experiments.
Software Dependencies No The paper mentions software like "Adam optimizer" but does not specify any version numbers for it or any other software dependencies.
Experiment Setup Yes We train the model with mini-batch size 168. (Abalone dataset) and We train the model with mini-batch size 128 and compare our Adam-like update (referred to as i Bayes LRuleadam ) to VOGN. We use the same initialization and hyperparameters in both methods. (CIFAR-10 dataset).