Bayesian Optimization with Robust Bayesian Neural Networks
Authors: Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach. |
| Researcher Affiliation | Academia | Department of Computer Science University of Freiburg {springj,kleinaa,sfalkner,fh}@cs.uni-freiburg.de |
| Pseudocode | No | The paper describes the method using mathematical equations and textual explanations, but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | An implementation of our method can be found at https://github. com/automl/Ro BO. |
| Open Datasets | Yes | Concretely, we considered a set of 21 different classification datasets downloaded from the Open ML repository [26]. |
| Dataset Splits | No | The paper mentions using a 'validation' set for ResNet on CIFAR-10, and refers to 'following the protocol' for other datasets, but does not provide explicit percentages or sample counts for the training, validation, or test splits. The exact splitting methodology is not detailed within the paper's text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or memory specifications. It only mentions training times. |
| Software Dependencies | No | The paper discusses algorithmic components like SGHMC and BNNs, but does not provide specific software dependency names with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | Unless noted otherwise, we used a three layer neural network with 50 tanh units for all experiments. For the priors we let p(θµ) = N(0, σ2 µ) be normally distributed and placed a Gamma hyperprior on σ2 µ, which is periodically updated via Gibbs sampling. For p(θ2 σ) we chose a log-normal prior. To approximate EI we used 50 samples acquired via SGHMC sampling. Maximization of the acquision function was performed via gradient ascent. For the remainder of the paper, we fix ϵ = 10 2 (a robust choice in our experience) and chose C such that we have ϵ ˆV 1/2 θ C = 0.05I (intuitively this corresponds to a constant decay in momentum of 0.05 per time step) potentially increasing it to satisfy the mentioned constraint at the end of the burn-in phase. |