Count-Based Exploration in Feature Space for Reinforcement Learning

Authors: Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation demonstrates that this simple approach achieves near state-of-the-art performance on highdimensional RL benchmarks.
Researcher Affiliation Academia Jarryd Martin, Suraj Narayanan S., Tom Everitt, Marcus Hutter Research School of Computer Science, Australian National University, Canberra jarrydmartinx@gmail.com, surajx@gmail.com, tom.everitt@anu.edu.au, marcus.hutter@anu.edu.au
Pseudocode Yes Algorithm 1 Reinforcement Learning with LFA and φ-EB.
Open Source Code No The paper does not provide any statement about making its source code publicly available or link to a code repository.
Open Datasets Yes We evaluate our algorithm on five games from the Arcade Learning Environment (ALE), which has recently become a standard high-dimensional benchmark for RL [Bellemare et al., 2013].
Dataset Splits No The paper mentions training and evaluation (testing) frames and episodes but does not specify a separate validation dataset split.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud resources) used for running the experiments.
Software Dependencies No The paper mentions using Sarsa(λ) and a Blob-PROST feature set but does not provide specific version numbers for any software, libraries, or frameworks.
Experiment Setup Yes The β coefficient in the φ-exploration bonus was set to 0.05 for all games, after a coarse parameter search. This search was performed once, across a range of ALE games, and a value was chosen for which the agent achieved good scores in most games. The parameters for the Sarsa(λ) algorithm are set to the same values as in [Liang et al., 2016].