MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
Authors: Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study three complex visuo-motor control domains and find that our method is 160% 250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100k interaction steps, 5 demonstrations). |
| Researcher Affiliation | Collaboration | 1Meta AI 2University of California San Diego {nihansen,haosu,xiw012}@ucsd.edu {vikashplus,aravraj}@meta.com |
| Pseudocode | Yes | Algorithm 1 Model-Based Reinforcement Learning with Demonstrations (Mo Dem) |
| Open Source Code | Yes | Code and videos are available at https://nicklashansen.github.io/modemrl. ... We provide extensive implementation details in appendices, and have made our full implementation available at https://github.com/facebookresearch/modem. |
| Open Datasets | Yes | Experiments are conducted with publicly available environments. ... We evaluate methods extensively across three domains: Adroit (Rajeswaran et al., 2018), Meta-World (Yu et al., 2019), and DMControl (Tassa et al., 2018). |
| Dataset Splits | No | The paper mentions evaluating methods under a budget of '100k online interactions' and using '5 demonstrations', which refers to the overall interaction budget for learning and evaluation, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional sense of static datasets. The experiments are conducted in an online reinforcement learning setting where data is collected interactively rather than from a pre-defined static split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch-like overview of our architecture' and 'Adam' as an optimizer, but it does not specify exact version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Table 5. Mo Dem hyperparameters. We list all relevant hyperparameters for our proposed method below. Highlighted rows are unique to Mo Dem, whereas the remainder are inherited from TD-MPC but included for completeness. |