MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

Authors: Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically study three complex visuo-motor control domains and find that our method is 160% 250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100k interaction steps, 5 demonstrations).
Researcher Affiliation Collaboration 1Meta AI 2University of California San Diego {nihansen,haosu,xiw012}@ucsd.edu {vikashplus,aravraj}@meta.com
Pseudocode Yes Algorithm 1 Model-Based Reinforcement Learning with Demonstrations (Mo Dem)
Open Source Code Yes Code and videos are available at https://nicklashansen.github.io/modemrl. ... We provide extensive implementation details in appendices, and have made our full implementation available at https://github.com/facebookresearch/modem.
Open Datasets Yes Experiments are conducted with publicly available environments. ... We evaluate methods extensively across three domains: Adroit (Rajeswaran et al., 2018), Meta-World (Yu et al., 2019), and DMControl (Tassa et al., 2018).
Dataset Splits No The paper mentions evaluating methods under a budget of '100k online interactions' and using '5 demonstrations', which refers to the overall interaction budget for learning and evaluation, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional sense of static datasets. The experiments are conducted in an online reinforcement learning setting where data is collected interactively rather than from a pre-defined static split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch-like overview of our architecture' and 'Adam' as an optimizer, but it does not specify exact version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes Table 5. Mo Dem hyperparameters. We list all relevant hyperparameters for our proposed method below. Highlighted rows are unique to Mo Dem, whereas the remainder are inherited from TD-MPC but included for completeness.