Black-Box Alpha Divergence Minimization
Authors: Jose Hernandez-Lobato, Yingzhen Li, Mark Rowland, Thang Bui, Daniel Hernandez-Lobato, Richard Turner
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on probit regression and neural network regression and classification problems show that BB-α with non-standard settings of α, such as α = 0.5, usually produces better predictions than with α 0 (VB) or α = 1 (EP). 4. Experiments We evaluated the proposed algorithm black-box alpha (BB-α), on regression and classification problems using a probit regression model and Bayesian neural networks. |
| Researcher Affiliation | Academia | 1Harvard University, 2University of Cambridge, 3Universidad Aut onoma de Madrid |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for BB-α is publicly available3 (footnote 3: https://bitbucket.org/jmh233/code_black_ box_alpha_icml_2016) |
| Open Datasets | Yes | probit regression datasets from the UCI data repository (Lichman, 2013)., MNIST digit classification problem., data from the Harvard Clean Energy Project, which is the world s largest materials high-throughput virtual screening effort (Hachmann et al., 2014). |
| Dataset Splits | Yes | The performance of each method is evaluated on 50 random training and test splits of the data with 90% and 10% of the data instances, respectively. |
| Hardware Specification | No | The paper states 'We implemented the neural networks in Theano and ran the different methods on GPUs' but does not provide specific details such as GPU model numbers, CPU information, or memory specifications. |
| Software Dependencies | No | The paper mentions software like Theano and tools like Adam, but it does not specify version numbers for these software dependencies (e.g., 'Theano (Bastien et al., 2012)' without a version number). |
| Experiment Setup | Yes | The mean parameters of q are initialized by independently sampling from a zero mean Gaussian with standard deviation 10 1. We use -10 as the initial value for the log-variance parameters... We optimize the different objective functions using minibatches of size 32 and Adam (Kingma & Ba, 2014) with its default parameter values during 200 epochs... We use minibatches of size 250 and run the different methods for 250 epochs... a learning rate of 0.0001. The noise variance is fixed to 0.16. |