reproducibilityindex.ai

Accelerated Linearized Laplace Approximation for Bayesian Deep Learning

Authors: Zhijie Deng, Feng Zhou, Jun Zhu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive studies to show that ELLA can be a low-cost and effective baseline for Bayesian DL. We ﬁrst describe how to specify the hyperparameters of ELLA, and use an illustrative regression task to demonstrate the effectiveness of ELLA (see Figure 1). We then experiment on standard image classiﬁcation benchmarks to exhibit the superiority of ELLA over competing baselines in aspects of both performance and scalability. We further show that ELLA can even scale up to modern architectures like vision transformers (Vi Ts) [11].
Researcher Affiliation	Collaboration	Qing Yuan Research Institute, Shanghai Jiao Tong University 2 Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch Joint ML Center, Tsinghua University 3 Center for Applied Statistics, School of Statistics, Renmin University of China 4 Pazhou Laboratory (Huangpu), Guangzhou, China
Pseudocode	Yes	Algorithm 1: Build the LLA posterior. and Algorithm 2: Build '.
Open Source Code	Yes	Code is available at https://github.com/thudzj/ELLA.
Open Datasets	Yes	We take 2000 MNIST images as training set X, and 256 others as validation set Xval. and Then, we evaluate ELLA on CIFAR-10 benchmark using Res Net architectures [18]. and We apply ELLA to Image Net classiﬁcation [8] to demonstrate its scalability.
Dataset Splits	Yes	We take 2000 MNIST images as training set X, and 256 others as validation set Xval.
Hardware Specification	Yes	Figure 4 (c) shows the comparison on the time used for predicting all CIFAR-10 test data (measured on an NVIDIA A40 GPU).
Software Dependencies	No	Prevalent DL libraries like Py Torch [47] and Jax [2] have already been armed with the capability for fw AD. (No specific version numbers are provided for PyTorch or Jax).
Experiment Setup	Yes	We simply set σ20 to 1 Nγ with γ as the weight decay coefﬁcient used for pretraining according to [10]. and Given these results, we set M = 2000 and K = 20 in the following experiments unless otherwise stated. and Regarding the setups, we use M = 2000 and K = 20 for ELLA;6 we use 20 MC samples to estimate the posterior predictive of MFVI-BF (as it incurs 20 NN forward passes), and use 512 ones for the other methods as stated.