Gradient-based Hyperparameter Optimization through Reversible Learning
Authors: Dougal Maclaurin, David Duvenaud, Ryan Adams
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section shows several proof-of-concept experiments in which we can more richly parameterize training and regularization schemes in ways that would have been previously impractical to optimize. The network was trained on 10,000 examples of MNIST |
| Researcher Affiliation | Academia | Dougal Maclaurin MACLAURIN@PHYSICS.HARVARD.EDU David Duvenaud DDUVENAUD@SEAS.HARVARD.EDU Ryan P. Adams RPA@SEAS.HARVARD.EDU |
| Pseudocode | Yes | Algorithm 1 Stochastic gradient descent with momentum |
| Open Source Code | Yes | Since we required access to the internal logic of RMD in order to implement Algorithm 2, we implemented our own automatic differentiation package for Python, available at github.com/HIPS/autograd. |
| Open Datasets | Yes | The network was trained on 10,000 examples of MNIST |
| Dataset Splits | Yes | Figure 7 shows a training set, the pixels of which were optimized to improve performance on a validation set of 10,000 examples from MNIST. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, or cloud instance specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Python' and 'Numpy (Oliphant, 2007)' as software used for their automatic differentiation package, but it does not specify any version numbers for these software dependencies or any other libraries. |
| Experiment Setup | Yes | Each meta-iteration trained a network for 100 iterations of SGD, meaning that the learning rate schedules were speciļ¬ed by 800 hyperparameters (100 iterations 4 layers 2 types of parameters)... The network was trained on 10,000 examples of MNIST, and had 4 layers, of sizes 784, 50, 50, and 50... We typically ran for 50 meta-iterations, and used a meta-step size of 0.04. |