Optimistic Meta-Gradients

Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado P. van Hasselt, András György, Satinder Singh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider the problem of minimizing an ill-conditioned convex quadratic and compare standard momentum to a version with meta-learned step-size, i.e. ϕ : (x, w) 7 w f(x), where is the Hadamard product. We find that introducing a non-linearity ϕ leads to a sizeable improvement in the rate of convergence. See Section 7.1 for further details.
Researcher Affiliation Industry Sebastian Flennerhag Google Deep Mind flennerhag@google.com Tom Zahavy Google Deep Mind Brendan O Donoghue Google Deep Mind Hado van Hasselt Google Deep Mind András György Google Deep Mind Satinder Singh Google Deep Mind
Pseudocode Yes Algorithm 1: Meta-learning in practice. ... Algorithm 2: Meta-learning in the convex setting. ... Algorithm 3: BMG in practice. ... Algorithm 4: Convex optimistic meta-learning.
Open Source Code No The paper does not provide any explicit statement or link for the release of its source code.
Open Datasets Yes We train a 50-layer Res Net following a standard protocol (Appendix C) with SGD as the baseline optimiser. ... Figure 1: Image Net. We compare training a 50layer Res Net using SGD against variants that tune an element-wise learning rate online using standard meta-learning or optimistic meta-learning.
Dataset Splits No The paper mentions training steps and test accuracy but does not specify train/validation/test splits by percentage or sample count, nor does it refer to predefined splits with citations.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes For each Q and each algorithm, we sweep over the learning rate, decay rate, and the initialization of w (see Table 2 for values) and report results for the best performing hyper parameters. ... We sweep over the learning rate (for SGD) or meta-learning rate and report results for the best hyper-parameter over three independent runs.