reproducibilityindex.ai

Count-Based Exploration with the Successor Representation

Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling5125-5133

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an extensive empirical evaluation to demonstrate this and we introduce the substochastic successor representation (SSR) to also understand, theoretically, the behavior of such a bonus.
Researcher Affiliation	Collaboration	Marlos C. Machado,1 Marc G. Bellemare,1 Michael Bowling,2,3 1Google AI, Brain Team, 2University of Alberta, 3Deep Mind Alberta
Pseudocode	No	The paper describes algorithms using equations and prose, but it does not contain explicit pseudocode blocks or sections labeled "Algorithm".
Open Source Code	Yes	The code used to generate all results in this section is available at: https://github.com/mcmachado/count based exploration sr/tree/master/tabular. The code used to generate the reported results is available at: https://github.com/mcmachado/count based exploration sr/tree/ master/function approximation.
Open Datasets	Yes	We evaluated our algorithm on the Arcade Learning Environment (Bellemare et al. 2013).
Dataset Splits	No	The paper describes the training process in terms of frames (e.g., "100 million frames") and seeds, but it does not provide explicit dataset splits (e.g., percentages or counts) for training, validation, and testing commonly found in supervised learning setups.
Hardware Specification	No	The paper discusses neural network architectures and deep reinforcement learning, implying computational resources, but it does not specify any particular hardware components like CPU models, GPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions software components such as "DQN", "RMSprop", and "Xavier initialization", but it does not provide specific version numbers for any of the software or libraries used (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup	Yes	We set β = 0.05 after a rough sweep over values in the game MONTEZUMA S REVENGE. We annealed ϵ in DQN s ϵ-greedy exploration over the ﬁrst million steps, starting at 1.0 and stopping at 0.1 as done by Bellemare et al. (2016). We trained the network with RMSprop with a step-size of 0.00025, an ϵ value of 0.01, and a decay of 0.95, which are the standard parameters for training DQN (Mnih et al. 2015). The discount factor, γ, is set to 0.99, and w TD = 1, w SR = 1000, w Recons = 0.001.