Offline Reinforcement Learning with Reverse Model-based Imagination
Authors: Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluations on the D4RL offline benchmark suite [18]. Empirical results show that ROMI significantly outperforms state-of-the-art model-free and model-based baselines. Our method achieves the best or comparable performance on 16 out of 24 tasks among all algorithms. Ablation studies verify that the reverse model imagination can effectively generate more conservative behaviors and achieve state-of-the-art performance on offline RL benchmark tasks. |
| Researcher Affiliation | Collaboration | Jianhao Wang 1, Wenzhe Li 1, Haozhe Jiang1, Guangxiang Zhu2 , Siyuan Li1, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Baidu Inc., China |
| Pseudocode | Yes | Algorithm 1 ROMI: Reverse Offline Model-based Imagination |
| Open Source Code | No | The paper states: "Videos of the experiments are available online3. 3https://sites.google.com/view/romi-offlinerl/". This link is explicitly for videos of experiments, not for the source code of the methodology itself. |
| Open Datasets | Yes | We conduct extensive evaluations on the D4RL offline benchmark suite [18]. Empirical results show that ROMI significantly outperforms state-of-the-art model-free and model-based baselines. |
| Dataset Splits | Yes | We evaluate ROMI on a wide range of domains in the D4RL benchmark [18], including the Maze2D domain, the Gym-Mu Jo Co tasks, and the Ant Maze domain. D4RL benchmark [18] introduces three flavors of datasets (i.e., fixed, diverse, and play) in this setting, which commands the ant from different types of start locations to various types of goals. |
| Hardware Specification | No | The provided text does not contain any specific details about the hardware used for the experiments (e.g., GPU/CPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper mentions using a 'conditional variational autoencoder (CVAE) [20, 8]' and 'model-free offline RL algorithm (e.g., BCQ [8] and CQL [11])' but does not specify version numbers for any software components. |
| Experiment Setup | Yes | Algorithm 1: ROMI: Reverse Offline Model-based Imagination. 1: Require: Offline dataset Denv, rollout horizon h, the number of iterations Cφ, Cθ, T, learning rates αφ, αθ, model-free offline RL algorithm (i.e., BCQ or CQL). |