Planning from Pixels using Inverse Dynamics Models

Authors: Keiran Paster, Sheila A. McIlraith, Jimmy Ba

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. We evaluate our world model on a diverse distribution of challenging visual goals in Atari games and the Deepmind Control Suite (Tassa et al., 2018) to assess both its accuracy and sample efficiency. In Table 1, we show the performance of our algorithm trained with only 500k agent steps.
Researcher Affiliation Academia Keiran Paster Department of Computer Science University of Toronto, Vector Institute keirp@cs.toronto.edu Sheila A. Mc Ilraith & Jimmy Ba Department of Computer Science University of Toronto, Vector Institute {sheila, jba}@cs.toronto.edu
Pseudocode Yes Algorithm 1: GLAMOR
Open Source Code Yes The code for training agents on both Atari and DM Control Suite along with evaluation code can be found at https://github.com/keirp/glamor.
Open Datasets Yes We evaluate our method on two types of environments: Atari games and control tasks in the Deepmind Control Suite (Tassa et al., 2018).
Dataset Splits No The paper does not explicitly describe a validation dataset split or strategy.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No The paper mentions 'optimizer Adam W' and other high-level components but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes Figure 5 shows the hyperparameters that were used to train our method. Hyper-parameter value optimizer Adam W weight-decay 0.01 normalization Group Norm learning-rate 5e-4 replay-ratio 4 eps-steps 3e5 eps-final 0.1 min-steps-learn 5e4 buffer size 1e6 policy trials 50 state size 512 clip-p-actions -3.15 lstm-hidden-dim 64 lstm-layers 1 train tasks 1000