Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
Authors: Kimin Lee, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of our Ca DM method to answer the following questions: Is Ca DM more robust to dynamics changes compared to other model-based RL methods (see Table 1)? Can Ca DM be combined with model-free RL methods to improve their generalization abilities (see Table 2)? Does the proposed prediction loss (1) improve the test performance (see Figure 6(a))? Can Ca DM make accurate predictions (see Figure 6(b) and Figure 7)? Does our context encoder extract meaningful contextual information (see Figure 6(c))? |
| Researcher Affiliation | Collaboration | Kimin Lee 1 * Younggyo Seo 2 * Seunghyun Lee 2 Honglak Lee 3 4 Jinwoo Shin 2 1UC Berkeley 2KAIST 3University of Michigan Ann Arbor 4Google Brain. |
| Pseudocode | Yes | Algorithm 1 Training context-aware dynamics model |
| Open Source Code | Yes | Our code is available at https://github.com/younggyoseo/Ca DM. |
| Open Datasets | Yes | We demonstrate the effectiveness of our proposed method on simulated robots (i.e., Half Cheetah, Ant, Crippled Half Cheetah, and Slim Humanoid) using the Mu Jo Co physics engine (Todorov et al., 2012) and classic control tasks (i.e., Cart Pole and Pendulum) from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes training and test environments and parameter ranges, but does not explicitly mention a validation set or how models were validated for hyperparameter tuning or early stopping, beyond selecting 'the model with the highest average return during training' (which could imply using a test set as a proxy for validation). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the 'Mu Jo Co physics engine (Todorov et al., 2012)' and 'Open AI Gym (Brockman et al., 2016)' but does not provide specific software version numbers for these or other dependencies like Python or deep learning frameworks. |
| Experiment Setup | No | The paper states 'Due to space limitation, we pro vide details about model architecture and hyperparameters in the supplementary material.', indicating these details are not in the main text. |