reproducibilityindex.ai

Emphatic Algorithms for Deep Reinforcement Learning

Authors: Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado Van Hasselt

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we provide an in-depth comparison of their qualitative properties on small diagnostic MDPs in Sec. 3. Finally, we demonstrate that combining emphatic trace with deep neural networks can improve performance on classic Atari video games in Sec. 4, reporting the highest score to date for an RL agent without experience replay in the 200M frames data regime: 497% median human normalized score across 57 games, improved from the baseline performance of 403%.
Researcher Affiliation	Collaboration	Ray Jiang 1 Tom Zahavy 1 Zhongwen Xu 1 Adam White 1 2 Matteo Hessel 1 Charles Blundell 1 Hado van Hasselt 1 1Deep Mind, London, UK. 2Amii, Department of Computing Science, University of Alberta.
Pseudocode	Yes	Algorithm 1 WETD weighted n-step TD. Algorithm 2 NETD weighted n-step TD. Algorithm 3 NETD-ACE Surreal.
Open Source Code	No	The paper mentions using specific open-source libraries (e.g., Jax libraries, RLax, Haiku, Optax) and provides URLs for these third-party libraries. However, it does not provide an explicit statement or link for the source code of the methodology described in this paper (e.g., the emphatic algorithms, WETD, NETD, or Surreal).
Open Datasets	Yes	Thus we evaluated the emphatic algorithms on Atari games from the Arcade Learning Environment (Bellemare et al., 2013), a widely used deep RL benchmark.
Dataset Splits	No	The paper describes a 'training regime' (200M frames) and an 'evaluation phase' but does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits of a single dataset).
Hardware Specification	Yes	For the implementation of Surreal, we used Jax libraries (Budden et al., 2020; Hennigan et al., 2020; Hessel et al., 2020) on a TPU Pod infrastructure called Sebulba (Hessel et al., 2021).
Software Dependencies	No	The paper mentions using 'Jax libraries' for implementation but does not provide specific version numbers for Jax or any other software dependencies such as PyTorch, TensorFlow, or specific library versions like RLax, Haiku, Optax.
Experiment Setup	Yes	In the mixed update scheme, n = 40, α = 6 10 4, max gradient norm = 0.3 yielded the best results. In the ﬁxed update scheme, the best hyper-parameters for Surreal were n = 10, α = 2 10 4, max gradient norm = 1.