reproducibilityindex.ai

When should agents explore?

Authors: Miruna Pislar, David Szepesvari, Georg Ostrovski, Diana L Borsa, Tom Schaul

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report a promising and detailed analysis on Atari, using two-mode exploration and switching at sub-episodic time-scales.
Researcher Affiliation	Industry	Deep Mind London, UK {mirunapislar,dsz,ostrovski,borsa,tom}@deepmind.com
Pseudocode	Yes	Algorithm 1 describes how this is done in pseudo-code.
Open Source Code	No	The paper mentions several open-source libraries that were used for implementation, such as JAX, Haiku, Optax, Chex, RLax, and Reverb, and provides their GitHub URLs. However, it does not state that the code specific to the methodology or contributions presented in this paper is open-source or provide a link for it.
Open Datasets	Yes	We conduct our investigations on a subset of games of the Atari Learning Environment (Bellemare et al., 2013)
Dataset Splits	No	The paper describes aspects of data handling like replay sequence length and burn-in, and how evaluation is performed, but it does not provide explicit training, validation, and test dataset splits in terms of percentages, sample counts, or references to predefined splits for the Atari Learning Environment.
Hardware Specification	Yes	one such experiment takes about 12 hours using 2 TPUs (one for the batch inference, the other for the learner) and 120 CPUs.
Software Dependencies	Yes	Our agent is implemented with JAX (Bradbury et al., 2018), uses the Haiku (Hennigan et al., 2020), Optax (Budden et al., 2020b), Chex (Budden et al., 2020a), and RLax (Hessel et al., 2020) libraries for neural networks, optimisation, testing, and RL losses, respectively, and Reverb (Cassirer et al., 2020) for distributed experience replay.
Experiment Setup	Yes	Refer to Table 1 for a full list of hyper-parameters.