Meta-Gradient Reinforcement Learning
Authors: Zhongwen Xu, Hado P. van Hasselt, David Silver
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When applied to 57 games on the Atari 2600 environment over 200 million frames, our algorithm achieved a new state-of-the-art performance. |
| Researcher Affiliation | Industry | Zhongwen Xu Deep Mind zhongwen@google.com Hado van Hasselt Deep Mind hado@google.com David Silver Deep Mind davidsilver@google.com |
| Pseudocode | Yes | The pseudo-code for the meta-gradient reinforcement learning algorithm is provided in Appendix A. |
| Open Source Code | No | The paper describes implementation details and refers to existing frameworks (e.g., IMPALA, TensorFlow) but does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We validate the proposed approach on Atari 2600 video games from Arcade Learning Environment (ALE) [Bellemare et al., 2013], a standard benchmark for deep reinforcement learning algorithms. |
| Dataset Splits | Yes | Our meta-gradient RL approach is based on the principle of online cross-validation [Sutton, 1992], using successive samples of experience. |
| Hardware Specification | No | The paper mentions using a 'deep Res Net architecture' and an 'efficient distributed implementation' (IMPALA) but does not provide specific details on the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'TensorFlow' and the 'IMPALA framework', but it does not specify version numbers for these or any other software dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | For self-contained purpose, we provide all of the important hyper-parameters used in this paper, including the ones following Espeholt et al. [2018] and the additional meta-learning optimisation hyper-parameters (i.e., meta batch size, meta learning rate β, embedding size for η), in Appendix B. |