Learning Curve Prediction with Bayesian Neural Networks

Authors: Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, Frank Hutter

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 EXPERIMENTS
Researcher Affiliation Academia Aaron Klein, Stefan Falkner, Jost Tobias Springenberg & Frank Hutter Department of Computer Science University of Freiburg {kleinaa,sfalkner,springj,fh}@cs.uni-freiburg.de
Pseudocode No The paper describes methods and processes but does not include a dedicated pseudocode block or a clearly labeled 'Algorithm X' section.
Open Source Code No The paper does not contain any statement about making its source code available or provide a link to a code repository.
Open Datasets Yes CNN: We sampled 256 configurations of 5 different hyperparameters of a 3-layer convolutional neural network (CNN) and trained each of them for 40 epochs on the CIFAR10 (Krizhevsky, 2009) benchmark. FCNet: We sampled 4096 configurations of 10 hyperparameters of a 2-layer feed forward neural network (FCNet) on MNIST (Le Cun et al., 2001)...
Dataset Splits Yes To estimate how well Bayesian neural networks perform in this task, we used the datasets from Section 3.1 and split all of them into 16 folds, allowing us to perform cross-validation of the predictive performance.
Hardware Specification No The paper does not specify any particular hardware components (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper refers to various methods and tools (e.g., 'probabilistic back propagation', 'SGLD', 'SGHMC', 'random forests', 'emcee') by citing their originating papers, but it does not specify the version numbers of any software libraries, frameworks, or dependencies used in their implementation.
Experiment Setup Yes For both networks, we used a 3-layer architecture with tanh activations and 64 units per layer. We also evaluate two different sampling methods for both types of networks: stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian MCMC (SGHMC), following the approach of Springenberg et al. (2016) to automatically adapt the noise estimate and the preconditioning of the gradients. and Table 2: Hyperparameter configuration space of the four different iterative methods. For the FCNet we decayed the learning rate by a αdecay = (1 + γ t) κ and also sampled different values for γ and κ.