Variational Learning of Bayesian Neural Networks via Bayesian Dark Knowledge
Authors: Gehui Shen, Xi Chen, Zhihong Deng
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of our methods, we conduct experiments on several datasets, including MNIST, SVHN and CIFAR10. Besides reporting classification accuracy, we show a series of evaluations of uncertainty prediction, including the uncertainty on OOD datasets as well as adversarial examples, calibration results and the active learning application. The compared baselines include a DNN trained by SGD, BNNs trained by two BDL methods: SGLD & BBB, EVBE [Pawlowski et al., 2017] baseline and two previous distillation methods: BDK [Balan et al., 2015] & APD [Wang et al., 2018]. |
| Researcher Affiliation | Academia | Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University {jueliangguke, mrcx, zhdeng}@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1: Two Proposed Distillation Methods |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | To demonstrate the effectiveness of our methods, we conduct experiments on several datasets, including MNIST, SVHN and CIFAR10. For MNIST, we select 10k training data for validation. We use a 2 layer MLP with 400 hidden units and Re LU activations as in previous work [Blundell et al., 2015; Balan et al., 2015] and treat not MNIST as OOD data following [Louizos and Welling, 2017]. This model has about 500k parameters. For SVHN and CIFAR10, we train the model on the first 5 classes (called SVHN5 and CIFAR5) and the data in the other 5 classes are considered as OOD data. We further select 10% data from training set for validation. We employ the larger Le Net architecture following Gal and Ghahramani [2016a] and Louizos and Welling [2017] and the model has about 5.74M parameters. |
| Dataset Splits | Yes | For MNIST, we select 10k training data for validation. We further select 10% data from training set for validation. |
| Hardware Specification | No | The paper mentions 'NVIDIA 1080Ti GPU with 11G memory' in the context of limitations for APD method, but does not explicitly describe the hardware used for its own experiments. |
| Software Dependencies | No | The paper mentions 'Cleverhans library [Goodfellow et al., 2016]' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In all experiments, we use 1 posterior sample during training and 100 posterior samples during test when BBB is employed. For MNIST, we select 10k training data for validation. We use a 2 layer MLP with 400 hidden units and Re LU activations as in previous work [Blundell et al., 2015; Balan et al., 2015]. For SVHN and CIFAR10, we employ the larger Le Net architecture following Gal and Ghahramani [2016a] and Louizos and Welling [2017]. |