Safe and Efficient Off-Policy Reinforcement Learning

Authors: Remi Munos, Tom Stepleton, Anna Harutyunyan, Marc Bellemare

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games. Finally, we illustrate the significance of Retrace(λ) in a deep learning setting by applying it to the suite of Atari 2600 games provided by the Arcade Learning Environment (Bellemare et al., 2013). We compare our algorithms performance on 60 different Atari 2600 games in the Arcade Learning Environment (Bellemare et al., 2013) using Bellemare et al. s inter-algorithm score distribution.
Researcher Affiliation Collaboration R emi Munos munos@google.com Google Deep Mind Thomas Stepleton stepleton@google.com Google Deep Mind Anna Harutyunyan anna.harutyunyan@vub.ac.be Vrije Universiteit Brussel Marc G. Bellemare bellemare@google.com Google Deep Mind
Pseudocode No The paper defines mathematical operators and theoretical derivations, but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statements about making its code open source or links to code repositories for the methodology described.
Open Datasets Yes Finally, we illustrate the significance of Retrace(λ) in a deep learning setting by applying it to the suite of Atari 2600 games provided by the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits No To validate our theoretical results, we employ Retrace(λ) in an experience replay (Lin, 1993) setting, where sample transitions are stored within a large but bounded replay memory and subsequently replayed as if they were new experience. The paper does not specify exact dataset split percentages, sample counts, or reference predefined splits for training, validation, or testing.
Hardware Specification No The paper mentions a 'deep learning setting' but does not specify any hardware details such as GPU or CPU models, processors, or memory used for experiments.
Software Dependencies No Our agent adapts the DQN architecture of Mnih et al. (2015) to replay short sequences from the memory (details in the appendix) instead of single transitions. The paper mentions using the DQN architecture and Arcade Learning Environment but does not provide specific version numbers for any software or libraries used for reproducibility.
Experiment Setup No Our agent adapts the DQN architecture of Mnih et al. (2015) to replay short sequences from the memory (details in the appendix) instead of single transitions. The main text refers to details in the appendix but does not itself contain specific hyperparameters or training configurations for the experimental setup.