Weight Uncertainty in Neural Network
Authors: Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning. |
| Researcher Affiliation | Industry | Charles Blundell CBLUNDELL@GOOGLE.COM Julien Cornebise JUCOR@GOOGLE.COM Koray Kavukcuoglu KORAYK@GOOGLE.COM Daan Wierstra WIERSTRA@GOOGLE.COM Google Deep Mind |
| Pseudocode | Yes | Each step of optimisation proceeds as follows: 1. Sample ϵ N(0, I). 2. Let w = µ + log(1 + exp(ρ)) ϵ. 3. Let θ = (µ, ρ). 4. Let f(w, θ) = log q(w|θ) log P(w)P(D|w). 5. Calculate the gradient with respect to the mean µ = f(w, θ) w + f(w, θ) 6. Calculate the gradient with respect to the standard deviation parameter ρ ρ = f(w, θ) w ϵ 1 + exp( ρ) + f(w, θ) 7. Update the variational parameters: µ µ α µ (5) ρ ρ α ρ. (6) |
| Open Source Code | No | The paper does not contain any explicit statements about making source code available, nor does it provide links to a code repository. |
| Open Datasets | Yes | We trained networks of various sizes on the MNIST digits dataset (Le Cun and Cortes, 1998), consisting of 60,000 training and 10,000 testing pixel images of size 28 by 28. |
| Dataset Splits | Yes | We trained on 50,000 digits and used 10,000 digits as a validation set, whilst Hinton et al. (2012) trained on 60,000 digits and did not use a validation set. |
| Hardware Specification | No | The paper mentions that 'all of the operations used are readily implemented on a GPU' but does not specify any particular GPU model (e.g., NVIDIA A100, Tesla V100), CPU model, memory, or any other specific hardware configuration details. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies or their version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn, etc.) that would be necessary for replication. |
| Experiment Setup | Yes | We considered learning rates of 10^-3, 10^-4 and 10^-5 with minibatches of size 128. For Bayes by Backprop, we averaged over either 1, 2, 5, or 10 samples and considered π { 1/4}, log σ1 {0, 1, 2} and log σ2 {6, 7, 8}. |