Modeling the Long Term Future in Model-Based Reinforcement Learning

Authors: Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS As discussed in Section 3, we study our proposed model under imitation learning and model-based RL. We perform experiments to answer the following questions: ... Our method achieves higher reward faster compared to baselines on a variety of tasks and environments in both the imitation learning and model-based reinforcement learning settings.
Researcher Affiliation Collaboration Mila, Universit e de Montr eal, Facebook AI Research, Polytechnique Montr eal, CIFAR Senior Fellow, Work done at Facebook AI Research, * Georgia Institute of Technology
Pseudocode Yes Algorithm 1 Model Predictive Control (MPC) ... Algorithm 2 Overall Algorithm
Open Source Code No No explicit statement or link was found indicating that the authors' own code for the methodology described in this paper is open-source. The paper mentions building on another open-sourced project but not releasing their own.
Open Datasets Yes We evaluate our model on continuous control tasks in Mujoco and Car Racing environments, as well as a partially observable 2D grid-world environments with subgoals called Baby AI (Chevalier-Boisvert & Willems, 2018).
Dataset Splits No No explicit mention of train/validation/test splits of a specific dataset was found. The paper states training on '10k expert trajectories' and evaluating on a 'real test environment' but does not provide specific percentages or counts for validation splits.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running experiments were provided in the paper. General computing resources from 'Compute Canada' and funding from 'Nvidia' are mentioned, but without specific model numbers.
Software Dependencies No No specific software versions were mentioned for the dependencies. The paper mentions using the 'Adam optimizer' and 'PPO' but without version numbers or explicit framework dependencies like PyTorch/TensorFlow versions.
Experiment Setup Yes We use the Adam optimizer Kingma & Ba (2014) and tune learning rates using [1e 3, 5e 4, 1e 4, 5e 5]. For the hyper parameters specific for our model, we tune KL starting weight between [0.15, 0.2, 0.25], the KL weight increase per iteration is fixed at 0.0005 and the auxiliary cost for predicting the backward hidden state bt is kept at 0.0005 for all experiments. ... all models are trained for 50 epochs.