reproducibilityindex.ai

Meta-Gradient Reinforcement Learning

Authors: Zhongwen Xu, Hado P. van Hasselt, David Silver

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to 57 games on the Atari 2600 environment over 200 million frames, our algorithm achieved a new state-of-the-art performance.
Researcher Affiliation	Industry	Zhongwen Xu Deep Mind zhongwen@google.com Hado van Hasselt Deep Mind hado@google.com David Silver Deep Mind davidsilver@google.com
Pseudocode	Yes	The pseudo-code for the meta-gradient reinforcement learning algorithm is provided in Appendix A.
Open Source Code	No	The paper describes implementation details and refers to existing frameworks (e.g., IMPALA, TensorFlow) but does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described.
Open Datasets	Yes	We validate the proposed approach on Atari 2600 video games from Arcade Learning Environment (ALE) [Bellemare et al., 2013], a standard benchmark for deep reinforcement learning algorithms.
Dataset Splits	Yes	Our meta-gradient RL approach is based on the principle of online cross-validation [Sutton, 1992], using successive samples of experience.
Hardware Specification	No	The paper mentions using a 'deep Res Net architecture' and an 'efficient distributed implementation' (IMPALA) but does not provide specific details on the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software components like 'TensorFlow' and the 'IMPALA framework', but it does not specify version numbers for these or any other software dependencies needed to replicate the experiments.
Experiment Setup	Yes	For self-contained purpose, we provide all of the important hyper-parameters used in this paper, including the ones following Espeholt et al. [2018] and the additional meta-learning optimisation hyper-parameters (i.e., meta batch size, meta learning rate β, embedding size for η), in Appendix B.