Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
Authors: Zhijie Deng, Feng Zhou, Jun Zhu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive studies to show that ELLA can be a low-cost and effective baseline for Bayesian DL. We first describe how to specify the hyperparameters of ELLA, and use an illustrative regression task to demonstrate the effectiveness of ELLA (see Figure 1). We then experiment on standard image classification benchmarks to exhibit the superiority of ELLA over competing baselines in aspects of both performance and scalability. We further show that ELLA can even scale up to modern architectures like vision transformers (Vi Ts) [11]. |
| Researcher Affiliation | Collaboration | Qing Yuan Research Institute, Shanghai Jiao Tong University 2 Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch Joint ML Center, Tsinghua University 3 Center for Applied Statistics, School of Statistics, Renmin University of China 4 Pazhou Laboratory (Huangpu), Guangzhou, China |
| Pseudocode | Yes | Algorithm 1: Build the LLA posterior. and Algorithm 2: Build '. |
| Open Source Code | Yes | Code is available at https://github.com/thudzj/ELLA. |
| Open Datasets | Yes | We take 2000 MNIST images as training set X, and 256 others as validation set Xval. and Then, we evaluate ELLA on CIFAR-10 benchmark using Res Net architectures [18]. and We apply ELLA to Image Net classification [8] to demonstrate its scalability. |
| Dataset Splits | Yes | We take 2000 MNIST images as training set X, and 256 others as validation set Xval. |
| Hardware Specification | Yes | Figure 4 (c) shows the comparison on the time used for predicting all CIFAR-10 test data (measured on an NVIDIA A40 GPU). |
| Software Dependencies | No | Prevalent DL libraries like Py Torch [47] and Jax [2] have already been armed with the capability for fw AD. (No specific version numbers are provided for PyTorch or Jax). |
| Experiment Setup | Yes | We simply set σ20 to 1 Nγ with γ as the weight decay coefficient used for pretraining according to [10]. and Given these results, we set M = 2000 and K = 20 in the following experiments unless otherwise stated. and Regarding the setups, we use M = 2000 and K = 20 for ELLA;6 we use 20 MC samples to estimate the posterior predictive of MFVI-BF (as it incurs 20 NN forward passes), and use 512 ones for the other methods as stated. |