Learning Expected Emphatic Traces for Deep RL

Authors: Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt7015-7023

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD(n) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.
Researcher Affiliation Collaboration 1 Deep Mind, London, UK 2 University of Oxford, Oxford, UK 3 Mc Gill University, Montreal, QC, Canada 4 Amii, Department of Computing, Science, University of Alberta
Pseudocode No The paper describes algorithms using mathematical equations and textual explanations (e.g., 'wt+1 = wt + αFt ...'), but it does not present a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper mentions 'JAX libraries (Hennigan et al. 2020; Budden et al. 2020; Hessel et al. 2020)' and that these are 'Licensed under Apache License 2.0', but it does not state that the code developed by the authors for their methodology is open-source or provide a link to it.
Open Datasets Yes We evaluate X-ETD(n) on a widely used deep RL benchmark, Atari games from the Arcade Learning Environment (Bellemare et al. 2013)
Dataset Splits No The paper mentions an 'evaluation phase at 200M-250M learning frames' and 'swept extensively on its hyperparameters to produce the best baseline', implying validation. However, it does not specify explicit train/validation/test dataset splits with percentages or absolute counts, nor does it cite a predefined split.
Hardware Specification Yes We implement all agents in a distributed system based on JAX libraries (Hennigan et al. 2020; Budden et al. 2020; Hessel et al. 2020) using a TPU Pod infrastructure called Sebulba (Hessel et al. 2021b).
Software Dependencies No The paper mentions using 'JAX libraries' and indicates they are 'Licensed under Apache License 2.0' (referring to the cited papers). However, it does not specify version numbers for JAX or any other software dependencies used in the experiments.
Experiment Setup Yes We tested all combinations of αw {2i : i = 6, . . . , 14} and αθ = αwβ, with β {0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 2.0, 5.0}. [...] The input observations are in RGB format without downsampling or gray scaling. We use an action repeat of 4, with max pooling over the last two frames and the life termination signal.