A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Authors: James Harrison, Luke Metz, Jascha Sohl-Dickstein

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Demonstrate experimentally that the resulting stabilized through ample regularization (STAR) learned optimizer is more stable and faster to meta-train, is more stable and performant than baseline learned optimizers even when applied for many more training steps than used during meta-training, and generalizes well to new tasks which are dissimilar from tasks it was meta-trained on.
Researcher Affiliation Industry James Harrison, Luke Metz, Jascha Sohl-Dickstein Google Research, Brain Team {jamesharrison, lmetz, jaschasd}@google.com
Pseudocode No The paper describes mathematical formulations and architecture details but does not include a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The code for our optimizer is available here: https://github.com/google/learned_optimization/ blob/main/learned_optimizers/adafac_nominal.py
Open Datasets Yes MLP on Fashion MNIST, CNN on CIFAR10... We apply the optimizer trained on the first task (a small MLP applied to Fashion MNIST) to a wide variety of learning tasks including large models such as a transformer [73]. Remarkably, the STAR... A shallow Resnet-like [68] model trained on 32x32 Image Net [69]. (c) A 256 unit LSTM [70] language model trained on LM1B [71].
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] These details are available in the appendix.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Also available in the appendix.
Software Dependencies No The paper mentions using JAX [88] but does not provide specific version numbers for JAX or any other software dependencies.
Experiment Setup Yes Experimental details are provided in Appendix B and in our open source code. Experiments on additional tasks, with other values of weight decay, ablations of the primary components of the STAR optimizer, and visualization of different random seeds as opposed to aggregated statistics, are provided in Appendix C.