Dropout Inference in Bayesian Neural Networks with Alpha-divergences

Authors: Yingzhen Li, Yarin Gal

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the reparameterised BB-α on Bayesian NNs with the dropout approximation. We assess the proposed inference in regression and classification tasks on standard benchmarking datasets, comparing different values of α.
Researcher Affiliation Academia Yingzhen Li 1 Yarin Gal 1 2 1University of Cambridge, UK 2The Alan Turing Institute, UK.
Pseudocode Yes def softmax_cross_ent_with_mc_logits(alpha): def loss(y_true, mc_logits): # mc_logits: MC samples of shape Mx Kx D mc_log_softmax = mc_logits \ K.max(mc_logits, axis=2, keepdims=True) mc_log_softmax = mc_log_softmax \ logsumexp(mc_log_softmax, 2) mc_ll = K.sum(y_true*mc_log_softmax,-1) return -1./alpha * (logsumexp(alpha * \ mc_ll, 1) + K.log(1.0 / K_mc)) return loss Figure 1. Code snippet for our induced classification loss.
Open Source Code No A code snippet for our induced loss is given in Figure 1, with more details in the appendix. (This implies a small part is shown, not the full codebase, and no explicit release statement or link is present.)
Open Datasets Yes We use benchmark UCI datasets2 that have been tested in related literature. 2http://archive.ics.uci.edu/ml/datasets.html and We further experiment with a classification task, comparing the accuracy of the various α values on the MNIST benchmark (Le Cun & Cortes, 1998).
Dataset Splits No We summarise the test negative log-likelihood (LL) and RMSE with standard error (across different random splits, the lower the better) for selected datasets in Figure 2 and 3, respectively. and The adversarial examples are generated on MNIST test data that is normalised to be in the range [0, 1]. (No explicit percentages or counts for train/val/test splits).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No Implementing this induced loss with Keras (Chollet, 2015) is as simple as a few lines of Python. (No specific version number for Keras or Python is provided).
Experiment Setup Yes The model is a single-layer neural network with 50 ReLU units for all datasets except for Protein and Year, which use 100 units. We consider α {0.0, 0.5, 1.0}... MC approximation with K = 10 samples is also deployed... We used dropout probability 0.5 and α {0, 0.5, 1}. Again, we use K = 10 samples at training time for all α values, and Ktest = 100 samples at test time. We use weight decay 10 6...