reproducibilityindex.ai

Optimistic Meta-Gradients

Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado P. van Hasselt, András György, Satinder Singh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider the problem of minimizing an ill-conditioned convex quadratic and compare standard momentum to a version with meta-learned step-size, i.e. ϕ : (x, w) 7 w f(x), where is the Hadamard product. We ﬁnd that introducing a non-linearity ϕ leads to a sizeable improvement in the rate of convergence. See Section 7.1 for further details.
Researcher Affiliation	Industry	Sebastian Flennerhag Google Deep Mind flennerhag@google.com Tom Zahavy Google Deep Mind Brendan O Donoghue Google Deep Mind Hado van Hasselt Google Deep Mind András György Google Deep Mind Satinder Singh Google Deep Mind
Pseudocode	Yes	Algorithm 1: Meta-learning in practice. ... Algorithm 2: Meta-learning in the convex setting. ... Algorithm 3: BMG in practice. ... Algorithm 4: Convex optimistic meta-learning.
Open Source Code	No	The paper does not provide any explicit statement or link for the release of its source code.
Open Datasets	Yes	We train a 50-layer Res Net following a standard protocol (Appendix C) with SGD as the baseline optimiser. ... Figure 1: Image Net. We compare training a 50layer Res Net using SGD against variants that tune an element-wise learning rate online using standard meta-learning or optimistic meta-learning.
Dataset Splits	No	The paper mentions training steps and test accuracy but does not specify train/validation/test splits by percentage or sample count, nor does it refer to predefined splits with citations.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For each Q and each algorithm, we sweep over the learning rate, decay rate, and the initialization of w (see Table 2 for values) and report results for the best performing hyper parameters. ... We sweep over the learning rate (for SGD) or meta-learning rate and report results for the best hyper-parameter over three independent runs.