Variational Learning of Bayesian Neural Networks via Bayesian Dark Knowledge

Authors: Gehui Shen, Xi Chen, Zhihong Deng

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of our methods, we conduct experiments on several datasets, including MNIST, SVHN and CIFAR10. Besides reporting classification accuracy, we show a series of evaluations of uncertainty prediction, including the uncertainty on OOD datasets as well as adversarial examples, calibration results and the active learning application. The compared baselines include a DNN trained by SGD, BNNs trained by two BDL methods: SGLD & BBB, EVBE [Pawlowski et al., 2017] baseline and two previous distillation methods: BDK [Balan et al., 2015] & APD [Wang et al., 2018].
Researcher Affiliation Academia Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University {jueliangguke, mrcx, zhdeng}@pku.edu.cn
Pseudocode Yes Algorithm 1: Two Proposed Distillation Methods
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes To demonstrate the effectiveness of our methods, we conduct experiments on several datasets, including MNIST, SVHN and CIFAR10. For MNIST, we select 10k training data for validation. We use a 2 layer MLP with 400 hidden units and Re LU activations as in previous work [Blundell et al., 2015; Balan et al., 2015] and treat not MNIST as OOD data following [Louizos and Welling, 2017]. This model has about 500k parameters. For SVHN and CIFAR10, we train the model on the first 5 classes (called SVHN5 and CIFAR5) and the data in the other 5 classes are considered as OOD data. We further select 10% data from training set for validation. We employ the larger Le Net architecture following Gal and Ghahramani [2016a] and Louizos and Welling [2017] and the model has about 5.74M parameters.
Dataset Splits Yes For MNIST, we select 10k training data for validation. We further select 10% data from training set for validation.
Hardware Specification No The paper mentions 'NVIDIA 1080Ti GPU with 11G memory' in the context of limitations for APD method, but does not explicitly describe the hardware used for its own experiments.
Software Dependencies No The paper mentions 'Cleverhans library [Goodfellow et al., 2016]' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In all experiments, we use 1 posterior sample during training and 100 posterior samples during test when BBB is employed. For MNIST, we select 10k training data for validation. We use a 2 layer MLP with 400 hidden units and Re LU activations as in previous work [Blundell et al., 2015; Balan et al., 2015]. For SVHN and CIFAR10, we employ the larger Le Net architecture following Gal and Ghahramani [2016a] and Louizos and Welling [2017].