Learning Curve Prediction with Bayesian Neural Networks
Authors: Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, Frank Hutter
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTS |
| Researcher Affiliation | Academia | Aaron Klein, Stefan Falkner, Jost Tobias Springenberg & Frank Hutter Department of Computer Science University of Freiburg {kleinaa,sfalkner,springj,fh}@cs.uni-freiburg.de |
| Pseudocode | No | The paper describes methods and processes but does not include a dedicated pseudocode block or a clearly labeled 'Algorithm X' section. |
| Open Source Code | No | The paper does not contain any statement about making its source code available or provide a link to a code repository. |
| Open Datasets | Yes | CNN: We sampled 256 configurations of 5 different hyperparameters of a 3-layer convolutional neural network (CNN) and trained each of them for 40 epochs on the CIFAR10 (Krizhevsky, 2009) benchmark. FCNet: We sampled 4096 configurations of 10 hyperparameters of a 2-layer feed forward neural network (FCNet) on MNIST (Le Cun et al., 2001)... |
| Dataset Splits | Yes | To estimate how well Bayesian neural networks perform in this task, we used the datasets from Section 3.1 and split all of them into 16 folds, allowing us to perform cross-validation of the predictive performance. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper refers to various methods and tools (e.g., 'probabilistic back propagation', 'SGLD', 'SGHMC', 'random forests', 'emcee') by citing their originating papers, but it does not specify the version numbers of any software libraries, frameworks, or dependencies used in their implementation. |
| Experiment Setup | Yes | For both networks, we used a 3-layer architecture with tanh activations and 64 units per layer. We also evaluate two different sampling methods for both types of networks: stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian MCMC (SGHMC), following the approach of Springenberg et al. (2016) to automatically adapt the noise estimate and the preconditioning of the gradients. and Table 2: Hyperparameter configuration space of the four different iterative methods. For the FCNet we decayed the learning rate by a αdecay = (1 + γ t) κ and also sampled different values for γ and κ. |