Variational Linearized Laplace Approximation for Bayesian Deep Learning

Authors: Luis A. Ortega, Simon Rodriguez Santana, Daniel Hernández-Lobato

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our proposed method against accelerated LLA (ELLA), which relies on the Nyström approximation, as well as other LLA variants employing the sample-thenoptimize principle. Experimental results, both on regression and classification datasets, show that our method outperforms these already existing efficient variants of LLA, both in terms of the quality of the predictive distribution and in terms of total computational time.
Researcher Affiliation Academia Luis A. Ortega 1 Simón Rodríguez Santana 2 Daniel Hernández-Lobato 1 1Universidad Autónoma de Madrid 2Institute for Research in Technology (IIT), ICAI Engineering School, Universidad Pontificia Comillas.
Pseudocode Yes Algorithm 1 shows the structure of Va LLA s training loop, where no Early-Stopping is considered. Using the kernels and A it is easy to compute the KL in Eq. 15. As a result, the training loop is easy to implement, since q(f), given by Q_mean and Q_var are easily computable as detailed in the algorithm.
Open Source Code Yes Va LLA s code is available at https: //github.com/Ludvins/Variational-LLA.
Open Datasets Yes (i) The Year dataset (UCI) with 515, 345 instances and 90 features. We use the original train/test splits. (ii) The US flight delay (Airline) dataset (Dutordoir et al., 2020). Following Ortega et al. (2023) we use the first 700, 000 instances for training and the next 100, 000 for testing. 8 features are considered: month, day of the month, day of the week, plane age, air time, distance, arrival time and departure time. (iii) The Taxi dataset, with data recorded on January, 2023 (Salimbeni & Deisenroth, 2017). 9 attributes are considered: time of day, day of week, day of month, month, PULocation ID, DOLocation ID, distance and duration. We filter trips shorter than 10 seconds and larger than 5 hours, resulting in 3 million instances.
Dataset Splits Yes The first 80% is used as train data, the next 10% as validation data, and the last (10%) as test data.
Hardware Specification No The paper does not specify the exact GPU models, CPU types, or other detailed hardware specifications used for running the experiments. It only mentions general computing resources in the acknowledgements: "Authors gratefully acknowledge the use of the facilities of Centro de Computacion Cientifica (CCC) at Universidad Autónoma de Madrid."
Software Dependencies No The paper mentions using specific software tools like 'Adam optimizer (Kingma & Ba, 2015)' and 'Laplace library (Daxberger et al., 2021a)' but does not provide version numbers for any software, libraries, or programming languages.
Experiment Setup Yes Va LLA utilizes a batch size of 100. In regression, MNIST and FMNIST problems, we train our own DNN (standard multi-layer perceptron)... Hyper-parameters in all LLA variants (diagonal, KFAC, last-layer LLA) are optimized using the marginal log-likelihood estimate. Additional experimental details are given in Appendix F. ...we carry out 40, 000 iterations of mini-batch size 100 in Va LLA. ...Adam optimizer (Kingma & Ba, 2015) with learning rate 10 2 and weight decay 10 2. ...learning rate 10 3 and weight decay 10 3.