Emphatic Algorithms for Deep Reinforcement Learning
Authors: Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado Van Hasselt
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we provide an in-depth comparison of their qualitative properties on small diagnostic MDPs in Sec. 3. Finally, we demonstrate that combining emphatic trace with deep neural networks can improve performance on classic Atari video games in Sec. 4, reporting the highest score to date for an RL agent without experience replay in the 200M frames data regime: 497% median human normalized score across 57 games, improved from the baseline performance of 403%. |
| Researcher Affiliation | Collaboration | Ray Jiang 1 Tom Zahavy 1 Zhongwen Xu 1 Adam White 1 2 Matteo Hessel 1 Charles Blundell 1 Hado van Hasselt 1 1Deep Mind, London, UK. 2Amii, Department of Computing Science, University of Alberta. |
| Pseudocode | Yes | Algorithm 1 WETD weighted n-step TD. Algorithm 2 NETD weighted n-step TD. Algorithm 3 NETD-ACE Surreal. |
| Open Source Code | No | The paper mentions using specific open-source libraries (e.g., Jax libraries, RLax, Haiku, Optax) and provides URLs for these *third-party* libraries. However, it does not provide an explicit statement or link for the source code of the *methodology described in this paper* (e.g., the emphatic algorithms, WETD, NETD, or Surreal). |
| Open Datasets | Yes | Thus we evaluated the emphatic algorithms on Atari games from the Arcade Learning Environment (Bellemare et al., 2013), a widely used deep RL benchmark. |
| Dataset Splits | No | The paper describes a 'training regime' (200M frames) and an 'evaluation phase' but does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits of a single dataset). |
| Hardware Specification | Yes | For the implementation of Surreal, we used Jax libraries (Budden et al., 2020; Hennigan et al., 2020; Hessel et al., 2020) on a TPU Pod infrastructure called Sebulba (Hessel et al., 2021). |
| Software Dependencies | No | The paper mentions using 'Jax libraries' for implementation but does not provide specific version numbers for Jax or any other software dependencies such as PyTorch, TensorFlow, or specific library versions like RLax, Haiku, Optax. |
| Experiment Setup | Yes | In the mixed update scheme, n = 40, α = 6 10 4, max gradient norm = 0.3 yielded the best results. In the fixed update scheme, the best hyper-parameters for Surreal were n = 10, α = 2 10 4, max gradient norm = 1. |