reproducibilityindex.ai

Eigenoption Discovery through the Deep Successor Representation

Authors: Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm in a tabular domain as well as on Atari 2600 games. We use the tabular domain to provide intuition about our algorithm and to compare it to the algorithms in the literature. Our evaluation in Atari 2600 games provides promising evidence of the applicability of our algorithm in a setting in which a representation of the agent s observation is learned from raw pixels.
Researcher Affiliation	Collaboration	1 University of Alberta, Edmonton, AB, Canada 2 University of Massachusetts, Amherst, MA, USA 3 IBM Research, Yorktown Heights, NY, USA
Pseudocode	Yes	Alg. 1 Eigenoption discovery through the SR
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There are no links or explicit statements about code release.
Open Datasets	Yes	We used four Atari 2600 games from the Arcade Learning Environment (Bellemare et al., 2013) as testbed: BANK HEIST, FREEWAY, MONTEZUMA S REVENGE, and MS. PAC-MAN.
Dataset Splits	No	The paper mentions building a dataset of 500,000 samples for training the network but does not specify explicit training/validation/test dataset splits with proportions, counts, or predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running its experiments.
Software Dependencies	No	The paper mentions using 'RMSProp' for training but does not provide specific version numbers for any key software components or libraries.
Experiment Setup	Yes	We followed the uniform random policy for 1,000 episodes to learn the SR. Episodes were 100 time steps long. We used a stepsize of 0.1, and we set γ = 0.9. (...) We used Q-learning (Watkins & Dayan, 1992) in our experiments parameters λ = 0, α = 0.1, and γ = 0.9. (...) We passed through the shufﬂed dataset 10 times, using RMSProp with a step size of 10^-4.