Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Expected Emphatic Traces for Deep RL

Authors: Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt7015-7023

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD(n) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.
Researcher Affiliation	Collaboration	1 Deep Mind, London, UK 2 University of Oxford, Oxford, UK 3 Mc Gill University, Montreal, QC, Canada 4 Amii, Department of Computing, Science, University of Alberta
Pseudocode	No	The paper describes algorithms using mathematical equations and textual explanations (e.g., 'wt+1 = wt + αFt ...'), but it does not present a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper mentions 'JAX libraries (Hennigan et al. 2020; Budden et al. 2020; Hessel et al. 2020)' and that these are 'Licensed under Apache License 2.0', but it does not state that the code developed by the authors for their methodology is open-source or provide a link to it.
Open Datasets	Yes	We evaluate X-ETD(n) on a widely used deep RL benchmark, Atari games from the Arcade Learning Environment (Bellemare et al. 2013)
Dataset Splits	No	The paper mentions an 'evaluation phase at 200M-250M learning frames' and 'swept extensively on its hyperparameters to produce the best baseline', implying validation. However, it does not specify explicit train/validation/test dataset splits with percentages or absolute counts, nor does it cite a predefined split.
Hardware Specification	Yes	We implement all agents in a distributed system based on JAX libraries (Hennigan et al. 2020; Budden et al. 2020; Hessel et al. 2020) using a TPU Pod infrastructure called Sebulba (Hessel et al. 2021b).
Software Dependencies	No	The paper mentions using 'JAX libraries' and indicates they are 'Licensed under Apache License 2.0' (referring to the cited papers). However, it does not specify version numbers for JAX or any other software dependencies used in the experiments.
Experiment Setup	Yes	We tested all combinations of αw {2i : i = 6, . . . , 14} and αθ = αwβ, with β {0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 2.0, 5.0}. [...] The input observations are in RGB format without downsampling or gray scaling. We use an action repeat of 4, with max pooling over the last two frames and the life termination signal.