reproducibilityindex.ai

Safe and Efficient Off-Policy Reinforcement Learning

Authors: Remi Munos, Tom Stepleton, Anna Harutyunyan, Marc Bellemare

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the beneﬁts of Retrace(λ) on a standard suite of Atari 2600 games. Finally, we illustrate the signiﬁcance of Retrace(λ) in a deep learning setting by applying it to the suite of Atari 2600 games provided by the Arcade Learning Environment (Bellemare et al., 2013). We compare our algorithms performance on 60 different Atari 2600 games in the Arcade Learning Environment (Bellemare et al., 2013) using Bellemare et al. s inter-algorithm score distribution.
Researcher Affiliation	Collaboration	R emi Munos munos@google.com Google Deep Mind Thomas Stepleton stepleton@google.com Google Deep Mind Anna Harutyunyan anna.harutyunyan@vub.ac.be Vrije Universiteit Brussel Marc G. Bellemare bellemare@google.com Google Deep Mind
Pseudocode	No	The paper defines mathematical operators and theoretical derivations, but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statements about making its code open source or links to code repositories for the methodology described.
Open Datasets	Yes	Finally, we illustrate the signiﬁcance of Retrace(λ) in a deep learning setting by applying it to the suite of Atari 2600 games provided by the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits	No	To validate our theoretical results, we employ Retrace(λ) in an experience replay (Lin, 1993) setting, where sample transitions are stored within a large but bounded replay memory and subsequently replayed as if they were new experience. The paper does not specify exact dataset split percentages, sample counts, or reference predefined splits for training, validation, or testing.
Hardware Specification	No	The paper mentions a 'deep learning setting' but does not specify any hardware details such as GPU or CPU models, processors, or memory used for experiments.
Software Dependencies	No	Our agent adapts the DQN architecture of Mnih et al. (2015) to replay short sequences from the memory (details in the appendix) instead of single transitions. The paper mentions using the DQN architecture and Arcade Learning Environment but does not provide specific version numbers for any software or libraries used for reproducibility.
Experiment Setup	No	Our agent adapts the DQN architecture of Mnih et al. (2015) to replay short sequences from the memory (details in the appendix) instead of single transitions. The main text refers to details in the appendix but does not itself contain specific hyperparameters or training configurations for the experimental setup.