Offline Reinforcement Learning with Reverse Model-based Imagination

Authors: Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluations on the D4RL offline benchmark suite [18]. Empirical results show that ROMI significantly outperforms state-of-the-art model-free and model-based baselines. Our method achieves the best or comparable performance on 16 out of 24 tasks among all algorithms. Ablation studies verify that the reverse model imagination can effectively generate more conservative behaviors and achieve state-of-the-art performance on offline RL benchmark tasks.
Researcher Affiliation Collaboration Jianhao Wang 1, Wenzhe Li 1, Haozhe Jiang1, Guangxiang Zhu2 , Siyuan Li1, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Baidu Inc., China
Pseudocode Yes Algorithm 1 ROMI: Reverse Offline Model-based Imagination
Open Source Code No The paper states: "Videos of the experiments are available online3. 3https://sites.google.com/view/romi-offlinerl/". This link is explicitly for videos of experiments, not for the source code of the methodology itself.
Open Datasets Yes We conduct extensive evaluations on the D4RL offline benchmark suite [18]. Empirical results show that ROMI significantly outperforms state-of-the-art model-free and model-based baselines.
Dataset Splits Yes We evaluate ROMI on a wide range of domains in the D4RL benchmark [18], including the Maze2D domain, the Gym-Mu Jo Co tasks, and the Ant Maze domain. D4RL benchmark [18] introduces three flavors of datasets (i.e., fixed, diverse, and play) in this setting, which commands the ant from different types of start locations to various types of goals.
Hardware Specification No The provided text does not contain any specific details about the hardware used for the experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies No The paper mentions using a 'conditional variational autoencoder (CVAE) [20, 8]' and 'model-free offline RL algorithm (e.g., BCQ [8] and CQL [11])' but does not specify version numbers for any software components.
Experiment Setup Yes Algorithm 1: ROMI: Reverse Offline Model-based Imagination. 1: Require: Offline dataset Denv, rollout horizon h, the number of iterations Cφ, Cθ, T, learning rates αφ, αθ, model-free offline RL algorithm (i.e., BCQ or CQL).