Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Meta-Gradient Reinforcement Learning
Authors: Zhongwen Xu, Hado P. van Hasselt, David Silver
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When applied to 57 games on the Atari 2600 environment over 200 million frames, our algorithm achieved a new state-of-the-art performance. |
| Researcher Affiliation | Industry | Zhongwen Xu Deep Mind EMAIL Hado van Hasselt Deep Mind EMAIL David Silver Deep Mind EMAIL |
| Pseudocode | Yes | The pseudo-code for the meta-gradient reinforcement learning algorithm is provided in Appendix A. |
| Open Source Code | No | The paper describes implementation details and refers to existing frameworks (e.g., IMPALA, TensorFlow) but does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We validate the proposed approach on Atari 2600 video games from Arcade Learning Environment (ALE) [Bellemare et al., 2013], a standard benchmark for deep reinforcement learning algorithms. |
| Dataset Splits | Yes | Our meta-gradient RL approach is based on the principle of online cross-validation [Sutton, 1992], using successive samples of experience. |
| Hardware Specification | No | The paper mentions using a 'deep Res Net architecture' and an 'efficient distributed implementation' (IMPALA) but does not provide specific details on the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'TensorFlow' and the 'IMPALA framework', but it does not specify version numbers for these or any other software dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | For self-contained purpose, we provide all of the important hyper-parameters used in this paper, including the ones following Espeholt et al. [2018] and the additional meta-learning optimisation hyper-parameters (i.e., meta batch size, meta learning rate β, embedding size for η), in Appendix B. |