Learning to Learn without Gradient Descent by Gradient Descent

Authors: Yutian Chen, Matthew W. Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando Freitas

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present several experiments that show the breadth of generalization that is achieved by our learned algorithms. The experiments show that the learned optimizers can transfer to optimize a large and diverse set of blackbox functions arising in global optimization, control, and hyper-parameter tuning.
Researcher Affiliation Industry 1Deep Mind, London, United Kingdom. Correspondence to: Yutian Chen <yutianc@google.com>.
Pseudocode No The paper describes the black-box optimization algorithm loop in text, but it does not present it as a formal pseudocode block or algorithm.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We include the three standard benchmarks in the HPOLib package (Eggensperger et al., 2013): SVM, online LDA, and logistic regression with 3, 3, and 4 hyper-parameters respectively. We also consider the problem of training a 6-hyper-parameter residual network for classification on the CIFAR-100 dataset.
Dataset Splits No The paper describes using functions sampled from a GP prior for training the RNN optimizers and uses benchmark functions and CIFAR-100 for testing, but it does not specify explicit training/validation/test dataset splits for the optimizer's training data itself (GP samples) or for the benchmark datasets in a way that allows reproduction of data partitioning.
Hardware Specification No The paper mentions '16 GPU hours' for one experiment, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided for running experiments.
Software Dependencies No The paper mentions various models and optimizers (LSTMs, DNCs, Adam) and comparison packages (Spearmint, Hyperopt (TPE), SMAC), but it does not specify version numbers for any software dependencies.
Experiment Setup Yes We train each RNN optimizer with trajectories of T steps, and update the RNN parameters using BPTT with Adam. We use a curriculum to increase the length of trajectories gradually from T = 10 to 100. For the first t N steps, we set ot 1 = 0, arbitrarily set the inputs to dummy values xt 1 = 0 and yt 1 = 0, and generate N parallel queries x1:N. We also simulate a runtime t Uniform(1 σ, 1+σ) associated with the t-th query.