reproducibilityindex.ai

Learning to Learn without Gradient Descent by Gradient Descent

Authors: Yutian Chen, Matthew W. Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando Freitas

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present several experiments that show the breadth of generalization that is achieved by our learned algorithms. The experiments show that the learned optimizers can transfer to optimize a large and diverse set of blackbox functions arising in global optimization, control, and hyper-parameter tuning.
Researcher Affiliation	Industry	1Deep Mind, London, United Kingdom. Correspondence to: Yutian Chen <yutianc@google.com>.
Pseudocode	No	The paper describes the black-box optimization algorithm loop in text, but it does not present it as a formal pseudocode block or algorithm.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We include the three standard benchmarks in the HPOLib package (Eggensperger et al., 2013): SVM, online LDA, and logistic regression with 3, 3, and 4 hyper-parameters respectively. We also consider the problem of training a 6-hyper-parameter residual network for classiﬁcation on the CIFAR-100 dataset.
Dataset Splits	No	The paper describes using functions sampled from a GP prior for training the RNN optimizers and uses benchmark functions and CIFAR-100 for testing, but it does not specify explicit training/validation/test dataset splits for the optimizer's training data itself (GP samples) or for the benchmark datasets in a way that allows reproduction of data partitioning.
Hardware Specification	No	The paper mentions '16 GPU hours' for one experiment, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided for running experiments.
Software Dependencies	No	The paper mentions various models and optimizers (LSTMs, DNCs, Adam) and comparison packages (Spearmint, Hyperopt (TPE), SMAC), but it does not specify version numbers for any software dependencies.
Experiment Setup	Yes	We train each RNN optimizer with trajectories of T steps, and update the RNN parameters using BPTT with Adam. We use a curriculum to increase the length of trajectories gradually from T = 10 to 100. For the ﬁrst t N steps, we set ot 1 = 0, arbitrarily set the inputs to dummy values xt 1 = 0 and yt 1 = 0, and generate N parallel queries x1:N. We also simulate a runtime t Uniform(1 σ, 1+σ) associated with the t-th query.