Count-Based Exploration with the Successor Representation
Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling5125-5133
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an extensive empirical evaluation to demonstrate this and we introduce the substochastic successor representation (SSR) to also understand, theoretically, the behavior of such a bonus. |
| Researcher Affiliation | Collaboration | Marlos C. Machado,1 Marc G. Bellemare,1 Michael Bowling,2,3 1Google AI, Brain Team, 2University of Alberta, 3Deep Mind Alberta |
| Pseudocode | No | The paper describes algorithms using equations and prose, but it does not contain explicit pseudocode blocks or sections labeled "Algorithm". |
| Open Source Code | Yes | The code used to generate all results in this section is available at: https://github.com/mcmachado/count based exploration sr/tree/master/tabular. The code used to generate the reported results is available at: https://github.com/mcmachado/count based exploration sr/tree/ master/function approximation. |
| Open Datasets | Yes | We evaluated our algorithm on the Arcade Learning Environment (Bellemare et al. 2013). |
| Dataset Splits | No | The paper describes the training process in terms of frames (e.g., "100 million frames") and seeds, but it does not provide explicit dataset splits (e.g., percentages or counts) for training, validation, and testing commonly found in supervised learning setups. |
| Hardware Specification | No | The paper discusses neural network architectures and deep reinforcement learning, implying computational resources, but it does not specify any particular hardware components like CPU models, GPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions software components such as "DQN", "RMSprop", and "Xavier initialization", but it does not provide specific version numbers for any of the software or libraries used (e.g., Python version, PyTorch/TensorFlow version). |
| Experiment Setup | Yes | We set β = 0.05 after a rough sweep over values in the game MONTEZUMA S REVENGE. We annealed ϵ in DQN s ϵ-greedy exploration over the first million steps, starting at 1.0 and stopping at 0.1 as done by Bellemare et al. (2016). We trained the network with RMSprop with a step-size of 0.00025, an ϵ value of 0.01, and a decay of 0.95, which are the standard parameters for training DQN (Mnih et al. 2015). The discount factor, γ, is set to 0.99, and w TD = 1, w SR = 1000, w Recons = 0.001. |