Gradient-based Hyperparameter Optimization through Reversible Learning

Authors: Dougal Maclaurin, David Duvenaud, Ryan Adams

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section shows several proof-of-concept experiments in which we can more richly parameterize training and regularization schemes in ways that would have been previously impractical to optimize. The network was trained on 10,000 examples of MNIST
Researcher Affiliation Academia Dougal Maclaurin MACLAURIN@PHYSICS.HARVARD.EDU David Duvenaud DDUVENAUD@SEAS.HARVARD.EDU Ryan P. Adams RPA@SEAS.HARVARD.EDU
Pseudocode Yes Algorithm 1 Stochastic gradient descent with momentum
Open Source Code Yes Since we required access to the internal logic of RMD in order to implement Algorithm 2, we implemented our own automatic differentiation package for Python, available at github.com/HIPS/autograd.
Open Datasets Yes The network was trained on 10,000 examples of MNIST
Dataset Splits Yes Figure 7 shows a training set, the pixels of which were optimized to improve performance on a validation set of 10,000 examples from MNIST.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, or cloud instance specifications) used to run the experiments.
Software Dependencies No The paper mentions 'Python' and 'Numpy (Oliphant, 2007)' as software used for their automatic differentiation package, but it does not specify any version numbers for these software dependencies or any other libraries.
Experiment Setup Yes Each meta-iteration trained a network for 100 iterations of SGD, meaning that the learning rate schedules were specified by 800 hyperparameters (100 iterations 4 layers 2 types of parameters)... The network was trained on 10,000 examples of MNIST, and had 4 layers, of sizes 784, 50, 50, and 50... We typically ran for 50 meta-iterations, and used a meta-step size of 0.04.