Model-Based Reinforcement Learning via Imagination with Derived Memory

Authors: Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Li, Chongjie Zhang, Jianye Hao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various high-dimensional visual control tasks in the DMControl benchmark demonstrate that IDM outperforms previous state-of-the-art methods in terms of policy robustness and further improves the sample efficiency of the model-based method. Ablation experiment results verify the superiority of IDM.
Researcher Affiliation Collaboration Yao Mu The University of Hong Kong muyao@connect.hku.hk Yuzheng Zhuang Huawei Noah s Ark Lab zhuangyuzheng@huawei.com Bin Wang Huawei Noah s Ark Lab wangbin158@huawei.com Guangxiang Zhu Tsinghua University guangxiangzhu@outlook.com Wulong Liu Huawei Noah s Ark Lab liuwulong@huawei.com Jianyu Chen Tsinghua University jianyuchen@tsinghua.edu.cn Ping Luo The University of Hong Kong pluo@cs.hku.hk Shengbo Eben Li Tsinghua University lishbo@tsinghua.edu.cn Chongjie Zhang Tsinghua University chongjie@tsinghua.edu.cn Jianye Hao Huawei Noah s Ark Lab haojianye@huawei.com
Pseudocode Yes The IDM framework is implemented as Algorithm 1 (see Appendix A.2).
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes The experiments are implemented in DMControl tasks [18] with observation uncertainty. [...] We test the performance of IDM in DMControl environments without image uncertainty with 5 random seeds.
Dataset Splits No No explicit mention of training, validation, or test dataset splits (e.g., percentages or counts) was found.
Hardware Specification Yes The hardware setting is NVIDIA GeForce RTX 3090 GPU and 32GB RAM.
Software Dependencies Yes All the experiments were implemented with Pytorch 1.9.0, Python 3.8.5, and cuda 11.1.
Experiment Setup Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A.2