Learning to Learn without Gradient Descent by Gradient Descent
Authors: Yutian Chen, Matthew W. Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando Freitas
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present several experiments that show the breadth of generalization that is achieved by our learned algorithms. The experiments show that the learned optimizers can transfer to optimize a large and diverse set of blackbox functions arising in global optimization, control, and hyper-parameter tuning. |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. Correspondence to: Yutian Chen <yutianc@google.com>. |
| Pseudocode | No | The paper describes the black-box optimization algorithm loop in text, but it does not present it as a formal pseudocode block or algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We include the three standard benchmarks in the HPOLib package (Eggensperger et al., 2013): SVM, online LDA, and logistic regression with 3, 3, and 4 hyper-parameters respectively. We also consider the problem of training a 6-hyper-parameter residual network for classification on the CIFAR-100 dataset. |
| Dataset Splits | No | The paper describes using functions sampled from a GP prior for training the RNN optimizers and uses benchmark functions and CIFAR-100 for testing, but it does not specify explicit training/validation/test dataset splits for the optimizer's training data itself (GP samples) or for the benchmark datasets in a way that allows reproduction of data partitioning. |
| Hardware Specification | No | The paper mentions '16 GPU hours' for one experiment, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided for running experiments. |
| Software Dependencies | No | The paper mentions various models and optimizers (LSTMs, DNCs, Adam) and comparison packages (Spearmint, Hyperopt (TPE), SMAC), but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We train each RNN optimizer with trajectories of T steps, and update the RNN parameters using BPTT with Adam. We use a curriculum to increase the length of trajectories gradually from T = 10 to 100. For the first t N steps, we set ot 1 = 0, arbitrarily set the inputs to dummy values xt 1 = 0 and yt 1 = 0, and generate N parallel queries x1:N. We also simulate a runtime t Uniform(1 σ, 1+σ) associated with the t-th query. |