Riemannian Laplace approximations for Bayesian neural networks
Authors: Federico Bergamin, Pablo Moreno-Muñoz, Søren Hauberg, Georgios Arvanitidis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that our approach consistently improves over the conventional Laplace approximation across tasks. |
| Researcher Affiliation | Academia | Federico Bergamin, Pablo Moreno-Muñoz, Søren Hauberg, Georgios Arvanitidis Section for Cognitive Systems, DTU Compute, Technical University of Denmark {fedbe, pabmo, sohau, gear}@dtu.dk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce the results is publicly available at https://github.com/federicobergamin/ riemannian-laplace-approximation |
| Open Datasets | Yes | We consider the toy-regression problem proposed by Snelson and Ghahramani [2005]. We consider a 2-dimensional binary classification problem using the banana dataset. UCI datasets. Image classification... MNIST and Fashion MNIST. |
| Dataset Splits | No | For the toy-regression problem, the paper states: 'we randomly pick 150 examples as our training set and the remaining 50 as a test set.' For image classification, it states: 'we subsample each dataset and we consider 5000 observations by keeping the proportionality of labels, and we test in the full test set containing 8000 examples.' While train and test splits are defined for some experiments, there is no explicit mention of a separate validation set split across all experiments, which is needed to fully reproduce the training procedures with hyperparameter tuning common in neural networks. |
| Hardware Specification | No | The paper mentions that 'our implementation relies on an off-the-shelf ODE solver which runs on the CPU while our automatic-differentiation based approach runs on the GPU' but does not specify any particular CPU or GPU models, memory, or other hardware details. |
| Software Dependencies | No | The paper mentions using 'functorch [Horace He, 2021]' and 'scipy [Virtanen et al., 2020] implementation of the explicit Runge-Kutta method of order 5(4) [Dormand and Prince, 1980]' but does not provide specific version numbers for 'functorch' or 'scipy' that were used during the experiments. |
| Experiment Setup | Yes | We train both model using full-dataset GD, using a weight decay of 1e-2 for the larger model and 1e-3 for the smaller model for 35000 and 700000 epochs respectively. In both cases, we use a learning rate of 1e-3. We train a 2-layer fully connected neural net with 16 hidden units per layer and tanh activation using SGD for 2500 epochs. We use a learning rate of 1e-3 and weight-decaay of 1e-2. We train it using Adam optimizer [Kingma and Ba, 2015] for 10000 epochs using a learning rate of 1e-3 and a weight decay of 1e-2. |