Planning from Pixels using Inverse Dynamics Models
Authors: Keiran Paster, Sheila A. McIlraith, Jimmy Ba
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. We evaluate our world model on a diverse distribution of challenging visual goals in Atari games and the Deepmind Control Suite (Tassa et al., 2018) to assess both its accuracy and sample efficiency. In Table 1, we show the performance of our algorithm trained with only 500k agent steps. |
| Researcher Affiliation | Academia | Keiran Paster Department of Computer Science University of Toronto, Vector Institute keirp@cs.toronto.edu Sheila A. Mc Ilraith & Jimmy Ba Department of Computer Science University of Toronto, Vector Institute {sheila, jba}@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1: GLAMOR |
| Open Source Code | Yes | The code for training agents on both Atari and DM Control Suite along with evaluation code can be found at https://github.com/keirp/glamor. |
| Open Datasets | Yes | We evaluate our method on two types of environments: Atari games and control tasks in the Deepmind Control Suite (Tassa et al., 2018). |
| Dataset Splits | No | The paper does not explicitly describe a validation dataset split or strategy. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions 'optimizer Adam W' and other high-level components but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Figure 5 shows the hyperparameters that were used to train our method. Hyper-parameter value optimizer Adam W weight-decay 0.01 normalization Group Norm learning-rate 5e-4 replay-ratio 4 eps-steps 3e5 eps-final 0.1 min-steps-learn 5e4 buffer size 1e6 policy trials 50 state size 512 clip-p-actions -3.15 lstm-hidden-dim 64 lstm-layers 1 train tasks 1000 |