Recurrent Environment Simulators

Authors: Silvia Chiappa, Sébastien Racaniere, Daan Wierstra, Shakir Mohamed

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an in-depth analysis of the factors affecting performance, providing the most extensive attempt to advance the understanding of the properties of these models. We test our simulators on three diverse and challenging families of environments, namely Atari 2600 games, a first-person game where an agent moves in randomly generated 3D mazes, and a 3D car racing environment; and show that they can be used for model-based exploration.
Researcher Affiliation Industry Silvia Chiappa, Sébastien Racaniere, Daan Wierstra & Shakir Mohamed Deep Mind, London, UK {csilvia, sracaniere, wierstra, shakir}@google.com
Pseudocode Yes for t = 1, episode Length, d do for n = 1, N do Choose random actions An = at:t+d 1; Predict ˆxn t+1:t+d; end Follow actions in An0 where n0 = argmaxnminj=0,10||ˆxn t+d xt j||2 end
Open Source Code No The paper does not provide any concrete access (specific link, explicit statement of release) to the source code for the methodology described.
Open Datasets Yes We used training and test datasets consisting of five and one million 210 160 RGB images respectively, with actions chosen from a trained DQN agent (Mnih et al., 2015) according to an ϵ = 0.2-greedy policy. Atari 2600 games from the arcade learning environment (Bellemare et al., 2013)
Dataset Splits No The paper mentions training and test sets but does not specify a separate validation dataset split for its experiments.
Hardware Specification No The paper does not provide specific hardware details (like exact GPU/CPU models or types) used for running its experiments.
Software Dependencies No The paper mentions software (Torch, RMSProp) but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes As stochastic gradient algorithm, we used centered RMSProp (Graves, 2013) with learning rate4 1e-5, epsilon 0.01, momentum 0.9, decay 0.95, and mini-batch size 16. We used a warm-up phase of length τ = 10 and we did not backpropagate the gradient to this phase.