Dropout Inference in Bayesian Neural Networks with Alpha-divergences
Authors: Yingzhen Li, Yarin Gal
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the reparameterised BB-α on Bayesian NNs with the dropout approximation. We assess the proposed inference in regression and classification tasks on standard benchmarking datasets, comparing different values of α. |
| Researcher Affiliation | Academia | Yingzhen Li 1 Yarin Gal 1 2 1University of Cambridge, UK 2The Alan Turing Institute, UK. |
| Pseudocode | Yes | def softmax_cross_ent_with_mc_logits(alpha): def loss(y_true, mc_logits): # mc_logits: MC samples of shape Mx Kx D mc_log_softmax = mc_logits \ K.max(mc_logits, axis=2, keepdims=True) mc_log_softmax = mc_log_softmax \ logsumexp(mc_log_softmax, 2) mc_ll = K.sum(y_true*mc_log_softmax,-1) return -1./alpha * (logsumexp(alpha * \ mc_ll, 1) + K.log(1.0 / K_mc)) return loss Figure 1. Code snippet for our induced classification loss. |
| Open Source Code | No | A code snippet for our induced loss is given in Figure 1, with more details in the appendix. (This implies a small part is shown, not the full codebase, and no explicit release statement or link is present.) |
| Open Datasets | Yes | We use benchmark UCI datasets2 that have been tested in related literature. 2http://archive.ics.uci.edu/ml/datasets.html and We further experiment with a classification task, comparing the accuracy of the various α values on the MNIST benchmark (Le Cun & Cortes, 1998). |
| Dataset Splits | No | We summarise the test negative log-likelihood (LL) and RMSE with standard error (across different random splits, the lower the better) for selected datasets in Figure 2 and 3, respectively. and The adversarial examples are generated on MNIST test data that is normalised to be in the range [0, 1]. (No explicit percentages or counts for train/val/test splits). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | Implementing this induced loss with Keras (Chollet, 2015) is as simple as a few lines of Python. (No specific version number for Keras or Python is provided). |
| Experiment Setup | Yes | The model is a single-layer neural network with 50 ReLU units for all datasets except for Protein and Year, which use 100 units. We consider α {0.0, 0.5, 1.0}... MC approximation with K = 10 samples is also deployed... We used dropout probability 0.5 and α {0, 0.5, 1}. Again, we use K = 10 samples at training time for all α values, and Ktest = 100 samples at test time. We use weight decay 10 6... |