Eigenoption Discovery through the Deep Successor Representation

Authors: Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm in a tabular domain as well as on Atari 2600 games. We use the tabular domain to provide intuition about our algorithm and to compare it to the algorithms in the literature. Our evaluation in Atari 2600 games provides promising evidence of the applicability of our algorithm in a setting in which a representation of the agent s observation is learned from raw pixels.
Researcher Affiliation Collaboration 1 University of Alberta, Edmonton, AB, Canada 2 University of Massachusetts, Amherst, MA, USA 3 IBM Research, Yorktown Heights, NY, USA
Pseudocode Yes Alg. 1 Eigenoption discovery through the SR
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There are no links or explicit statements about code release.
Open Datasets Yes We used four Atari 2600 games from the Arcade Learning Environment (Bellemare et al., 2013) as testbed: BANK HEIST, FREEWAY, MONTEZUMA S REVENGE, and MS. PAC-MAN.
Dataset Splits No The paper mentions building a dataset of 500,000 samples for training the network but does not specify explicit training/validation/test dataset splits with proportions, counts, or predefined splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running its experiments.
Software Dependencies No The paper mentions using 'RMSProp' for training but does not provide specific version numbers for any key software components or libraries.
Experiment Setup Yes We followed the uniform random policy for 1,000 episodes to learn the SR. Episodes were 100 time steps long. We used a stepsize of 0.1, and we set γ = 0.9. (...) We used Q-learning (Watkins & Dayan, 1992) in our experiments parameters λ = 0, α = 0.1, and γ = 0.9. (...) We passed through the shuffled dataset 10 times, using RMSProp with a step size of 10^-4.