Efficient Low Rank Gaussian Variational Inference for Neural Networks
Authors: Marcin Tomczak, Siddharth Swaroop, Richard Turner
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that adding low-rank terms to parametrized diagonal covariance does not improve predictive performance except on small networks, but low-rank terms added to a constant diagonal covariance improves performance on small and large-scale network architectures. In our experiments, we focus on demonstrating the following: (i) ELRG-D-VI improves over MF-VI for small networks, but not on larger models, (ii) ELRG-VI has better predictive performance than MF-VI, (iii) ELRG-VI scales up to large CNNs and provides better predictive distributions than MAP, MF-VI and MC Dropout, (iv) sharing variational samples as in [34, 41, 43] leads to poor predictive performance. |
| Researcher Affiliation | Academia | Marcin B. Tomczak University of Cambridge Cambridge, CB2 1PZ, UK mbt27@cam.ac.uk Siddharth Swaroop University of Cambridge Cambridge, CB2 1PZ, UK ss2163@cam.ac.uk Richard E. Turner University of Cambridge Cambridge, CB2 1PZ, UK ret26@cam.ac.uk |
| Pseudocode | No | The paper discusses algorithmic details and computational costs but does not contain any structured pseudocode or an explicitly labeled algorithm block. |
| Open Source Code | Yes | We open-source the implementation of the algorithm derived in this paper at https://github.com/marctom/elrgvi. |
| Open Datasets | Yes | We consider a two dimensional synthetic classification dataset... classify vectorized MNIST [26] images... We experiment with common simple computer vision benchmarks: MNIST, KUZUSHIJI [5], FASHIONMNIST [47] and CIFAR10 [23]... We consider 4 data sets: CIFAR10, CIFAR100 [23], SVHN [32] and STL10 (10 classes, 5000 images 96 96) [7]. |
| Dataset Splits | No | We plot the learning curves in Figure 2... avg. valid neg. log likelihood... Using K > 0 improves held out log likelihood by a visible margin... The paper uses a validation set but does not explicitly provide specific split percentages or sample counts for training, validation, and test sets, only referring to performance metrics on them. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'ADAM optimizer [20]' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use the default ADAM optimizer [20], batch size of 256, 1 variational samples per update, and run optimization for 500 epochs. We train all models for 500 epochs (except MAP, which is run for 50 epochs) using a batch size of 512 using the ADAM optimizer [20], and do not use data augmentation. We train all algorithms for 200 epochs using a batch size of 256 and the ADAM optimizer [20], with data augmentation. |