reproducibilityindex.ai

Bayesian Optimization with Robust Bayesian Neural Networks

Authors: Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and ﬂexibility of this approach.
Researcher Affiliation	Academia	Department of Computer Science University of Freiburg {springj,kleinaa,sfalkner,fh}@cs.uni-freiburg.de
Pseudocode	No	The paper describes the method using mathematical equations and textual explanations, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	An implementation of our method can be found at https://github. com/automl/Ro BO.
Open Datasets	Yes	Concretely, we considered a set of 21 different classiﬁcation datasets downloaded from the Open ML repository [26].
Dataset Splits	No	The paper mentions using a 'validation' set for ResNet on CIFAR-10, and refers to 'following the protocol' for other datasets, but does not provide explicit percentages or sample counts for the training, validation, or test splits. The exact splitting methodology is not detailed within the paper's text.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or memory specifications. It only mentions training times.
Software Dependencies	No	The paper discusses algorithmic components like SGHMC and BNNs, but does not provide specific software dependency names with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Unless noted otherwise, we used a three layer neural network with 50 tanh units for all experiments. For the priors we let p(θµ) = N(0, σ2 µ) be normally distributed and placed a Gamma hyperprior on σ2 µ, which is periodically updated via Gibbs sampling. For p(θ2 σ) we chose a log-normal prior. To approximate EI we used 50 samples acquired via SGHMC sampling. Maximization of the acquision function was performed via gradient ascent. For the remainder of the paper, we ﬁx ϵ = 10 2 (a robust choice in our experience) and chose C such that we have ϵ ˆV 1/2 θ C = 0.05I (intuitively this corresponds to a constant decay in momentum of 0.05 per time step) potentially increasing it to satisfy the mentioned constraint at the end of the burn-in phase.