reproducibilityindex.ai

Planning from Pixels using Inverse Dynamics Models

Authors: Keiran Paster, Sheila A. McIlraith, Jimmy Ba

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. We evaluate our world model on a diverse distribution of challenging visual goals in Atari games and the Deepmind Control Suite (Tassa et al., 2018) to assess both its accuracy and sample efﬁciency. In Table 1, we show the performance of our algorithm trained with only 500k agent steps.
Researcher Affiliation	Academia	Keiran Paster Department of Computer Science University of Toronto, Vector Institute keirp@cs.toronto.edu Sheila A. Mc Ilraith & Jimmy Ba Department of Computer Science University of Toronto, Vector Institute {sheila, jba}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1: GLAMOR
Open Source Code	Yes	The code for training agents on both Atari and DM Control Suite along with evaluation code can be found at https://github.com/keirp/glamor.
Open Datasets	Yes	We evaluate our method on two types of environments: Atari games and control tasks in the Deepmind Control Suite (Tassa et al., 2018).
Dataset Splits	No	The paper does not explicitly describe a validation dataset split or strategy.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions 'optimizer Adam W' and other high-level components but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup	Yes	Figure 5 shows the hyperparameters that were used to train our method. Hyper-parameter value optimizer Adam W weight-decay 0.01 normalization Group Norm learning-rate 5e-4 replay-ratio 4 eps-steps 3e5 eps-ﬁnal 0.1 min-steps-learn 5e4 buffer size 1e6 policy trials 50 state size 512 clip-p-actions -3.15 lstm-hidden-dim 64 lstm-layers 1 train tasks 1000