reproducibilityindex.ai

A Simple Guard for Learned Optimizers

Authors: Isabeau Prémont-Schwarz, Jaroslav Vı́tků, Jan Feyereisl

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments, This chapter compares the proposed LGL2O with the original GL2O, non-guarded L2O and baseline hand-crafted algorithms. Then we follow with out of distribution experiments. First with only the dataset being out of distribution (other than MNIST), then with only the optimizee being out of distribution (Conv Nets instead of MLPs), and finally both. We prove that our guard keeps the convergence guarantee of the designed optimizer. We show theoretical proof of LGL2O s convergence guarantee.
Researcher Affiliation	Industry	1Good AI, Prague, Czechia. Correspondence to: Isabeau Pr emont-Schwarz <premont-schwarz@goodai.com>, Jaroslav V ıtk u <jaroslav.vitku@goodai.com>.
Pseudocode	Yes	Algorithm 1 Loss Guarded L2O with (deterministic) gradient descent, Algorithm 2 Loss Guarded L2O with stochastic gradient descent
Open Source Code	No	The paper does not provide any specific links to open-source code or explicit statements about code availability for the described methodology.
Open Datasets	Yes	The experiments were conducted on publicly available datasets, namely MNIST (Le Cun & Cortes, 2010), Fashion MNIST (Xiao et al., 2017), CIFAR10 (Krizhevsky, 2009), Tiny Imagenet1 a subset of Imagenet dataset (Russakovsky et al., 2015) and simple datasets from the Scikit-learn library (Pedregosa et al., 2011). Tiny Imagenet dataset is publicly available from Kaggle competition website at https://www.kaggle.com/c/tinyimagenet/data
Dataset Splits	Yes	Sample nt train mini-batches Bt = [b1, . . . , bnt], Sample nc validation mini-batches Bv = [v1, . . . , vnc], In all our experiments, both nt and nc are chosen to be 10.
Hardware Specification	Yes	Every run was run on a single NVIDIA GPU with a memory of between 4Gb and 12 Gb.
Software Dependencies	Yes	All experiments were code in Python 3.9 with Py Torch 1.8.1 on CUDA 11.0. The Sci Kit learn datasets were loaded from Sci Kit-Learn version 0.24.0.
Experiment Setup	Yes	In all our experiments, both nt and nc are chosen to be 10. The learned optimizer (and it s weights) is identical in all experiments and consists of an LSTM (Hochreiter & Jurgen Schmidhuber, 1997) with 2 hidden layers of 20 cells each and a linear output layer which was meta-trained with a rollout-length of 100 steps to optimize an MLP on the MNIST dataset. Adam were found for each combination of optimizee (MLP or Conv Net) and dataset found from the set [0.0001, 0.001, 0.01, 0.1] over 300 optimization steps. In case of SGD, the learning rate was set based on practical experience to 3.0 and (the optional) momentum to 0.9.