reproducibilityindex.ai

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Authors: James Harrison, Luke Metz, Jascha Sohl-Dickstein

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Demonstrate experimentally that the resulting stabilized through ample regularization (STAR) learned optimizer is more stable and faster to meta-train, is more stable and performant than baseline learned optimizers even when applied for many more training steps than used during meta-training, and generalizes well to new tasks which are dissimilar from tasks it was meta-trained on.
Researcher Affiliation	Industry	James Harrison, Luke Metz, Jascha Sohl-Dickstein Google Research, Brain Team {jamesharrison, lmetz, jaschasd}@google.com
Pseudocode	No	The paper describes mathematical formulations and architecture details but does not include a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	The code for our optimizer is available here: https://github.com/google/learned_optimization/ blob/main/learned_optimizers/adafac_nominal.py
Open Datasets	Yes	MLP on Fashion MNIST, CNN on CIFAR10... We apply the optimizer trained on the ﬁrst task (a small MLP applied to Fashion MNIST) to a wide variety of learning tasks including large models such as a transformer [73]. Remarkably, the STAR... A shallow Resnet-like [68] model trained on 32x32 Image Net [69]. (c) A 256 unit LSTM [70] language model trained on LM1B [71].
Dataset Splits	Yes	Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] These details are available in the appendix.
Hardware Specification	No	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Also available in the appendix.
Software Dependencies	No	The paper mentions using JAX [88] but does not provide specific version numbers for JAX or any other software dependencies.
Experiment Setup	Yes	Experimental details are provided in Appendix B and in our open source code. Experiments on additional tasks, with other values of weight decay, ablations of the primary components of the STAR optimizer, and visualization of different random seeds as opposed to aggregated statistics, are provided in Appendix C.