When should agents explore?
Authors: Miruna Pislar, David Szepesvari, Georg Ostrovski, Diana L Borsa, Tom Schaul
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report a promising and detailed analysis on Atari, using two-mode exploration and switching at sub-episodic time-scales. |
| Researcher Affiliation | Industry | Deep Mind London, UK {mirunapislar,dsz,ostrovski,borsa,tom}@deepmind.com |
| Pseudocode | Yes | Algorithm 1 describes how this is done in pseudo-code. |
| Open Source Code | No | The paper mentions several open-source libraries that were used for implementation, such as JAX, Haiku, Optax, Chex, RLax, and Reverb, and provides their GitHub URLs. However, it does not state that the code specific to the methodology or contributions presented in this paper is open-source or provide a link for it. |
| Open Datasets | Yes | We conduct our investigations on a subset of games of the Atari Learning Environment (Bellemare et al., 2013) |
| Dataset Splits | No | The paper describes aspects of data handling like replay sequence length and burn-in, and how evaluation is performed, but it does not provide explicit training, validation, and test dataset splits in terms of percentages, sample counts, or references to predefined splits for the Atari Learning Environment. |
| Hardware Specification | Yes | one such experiment takes about 12 hours using 2 TPUs (one for the batch inference, the other for the learner) and 120 CPUs. |
| Software Dependencies | Yes | Our agent is implemented with JAX (Bradbury et al., 2018), uses the Haiku (Hennigan et al., 2020), Optax (Budden et al., 2020b), Chex (Budden et al., 2020a), and RLax (Hessel et al., 2020) libraries for neural networks, optimisation, testing, and RL losses, respectively, and Reverb (Cassirer et al., 2020) for distributed experience replay. |
| Experiment Setup | Yes | Refer to Table 1 for a full list of hyper-parameters. |