Count-Based Exploration in Feature Space for Reinforcement Learning
Authors: Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation demonstrates that this simple approach achieves near state-of-the-art performance on highdimensional RL benchmarks. |
| Researcher Affiliation | Academia | Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter Research School of Computer Science, Australian National University, Canberra jarrydmartinx@gmail.com, surajx@gmail.com, tom.everitt@anu.edu.au, marcus.hutter@anu.edu.au |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning with LFA and φ-EB. |
| Open Source Code | No | The paper does not provide any statement about making its source code publicly available or link to a code repository. |
| Open Datasets | Yes | We evaluate our algorithm on five games from the Arcade Learning Environment (ALE), which has recently become a standard high-dimensional benchmark for RL [Bellemare et al., 2013]. |
| Dataset Splits | No | The paper mentions training and evaluation (testing) frames and episodes but does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Sarsa(λ) and a Blob-PROST feature set but does not provide specific version numbers for any software, libraries, or frameworks. |
| Experiment Setup | Yes | The β coefficient in the φ-exploration bonus was set to 0.05 for all games, after a coarse parameter search. This search was performed once, across a range of ALE games, and a value was chosen for which the agent achieved good scores in most games. The parameters for the Sarsa(λ) algorithm are set to the same values as in [Liang et al., 2016]. |