reproducibilityindex.ai

Temporally-Extended ε-Greedy Exploration

Authors: Will Dabney, Georg Ostrovski, Andre Barreto

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present empirical results on tabular, linear, and deep RL settings, pursuing two objectives: The ﬁrst is to demonstrate the generality of our method in applying it across domains as well as across multiple value-based reinforcement learning algorithms (Q-learning, SARSA, Rainbow, R2D2).
Researcher Affiliation	Industry	Will Dabney, Georg Ostrovski & Andr e Barreto Deep Mind London, UK {wdabney,ostrovski,andrebarreto}@google.com
Pseudocode	Yes	Algorithm 1 ϵz-Greedy exploration policy
Open Source Code	No	The paper does not provide any specific links or statements about releasing the source code for the described methodology.
Open Datasets	Yes	Atari-57: Deep RL Motivated by the results in tabular and linear settings, we now turn to deep RL and evaluate performance on 57 Atari 2600 games in the Arcade Learning Environment (ALE) (Bellemare et al., 2013).
Dataset Splits	No	The paper mentions evaluation phases during training (e.g., 'every 1M environment frames learning is frozen and the agent is evaluated for 500K environment frames') but does not explicitly provide quantitative training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits for reproducibility).
Hardware Specification	Yes	Rainbow-based agents were implemented in Python using JAX, with each conﬁguration (game, algorithm, hyper-parameter setting) run on a single V100 GPU.
Software Dependencies	No	The paper mentions software like Python, JAX, and TensorFlow but does not provide specific version numbers for these or any other software dependencies needed for reproducibility.
Experiment Setup	Yes	Unless stated otherwise, hyper-parameters for our Rainbow-based agents follow the original implementation in Hessel et al. (2018), see Table 2. An exception is the Rainbow-CTS agent, which uses a regular dueling value network instead of the Noisy Nets variant, and also makes use of an ϵ-greedy policy (whereas the baseline Rainbow relies on its Noisy Nets value head for exploration). The ϵ parameter follows a linear decay schedule 1.0 to 0.01 over the course of the ﬁrst 4M frames, remaining constant after that. Evaluation happens with an even lower value of ϵ = 0.001.