Bayesian Optimization with Robust Bayesian Neural Networks

Authors: Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.
Researcher Affiliation Academia Department of Computer Science University of Freiburg {springj,kleinaa,sfalkner,fh}@cs.uni-freiburg.de
Pseudocode No The paper describes the method using mathematical equations and textual explanations, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes An implementation of our method can be found at https://github. com/automl/Ro BO.
Open Datasets Yes Concretely, we considered a set of 21 different classification datasets downloaded from the Open ML repository [26].
Dataset Splits No The paper mentions using a 'validation' set for ResNet on CIFAR-10, and refers to 'following the protocol' for other datasets, but does not provide explicit percentages or sample counts for the training, validation, or test splits. The exact splitting methodology is not detailed within the paper's text.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or memory specifications. It only mentions training times.
Software Dependencies No The paper discusses algorithmic components like SGHMC and BNNs, but does not provide specific software dependency names with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes Unless noted otherwise, we used a three layer neural network with 50 tanh units for all experiments. For the priors we let p(θµ) = N(0, σ2 µ) be normally distributed and placed a Gamma hyperprior on σ2 µ, which is periodically updated via Gibbs sampling. For p(θ2 σ) we chose a log-normal prior. To approximate EI we used 50 samples acquired via SGHMC sampling. Maximization of the acquision function was performed via gradient ascent. For the remainder of the paper, we fix ϵ = 10 2 (a robust choice in our experience) and chose C such that we have ϵ ˆV 1/2 θ C = 0.05I (intuitively this corresponds to a constant decay in momentum of 0.05 per time step) potentially increasing it to satisfy the mentioned constraint at the end of the burn-in phase.