reproducibilityindex.ai

Offline Reinforcement Learning with Reverse Model-based Imagination

Authors: Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluations on the D4RL ofﬂine benchmark suite [18]. Empirical results show that ROMI signiﬁcantly outperforms state-of-the-art model-free and model-based baselines. Our method achieves the best or comparable performance on 16 out of 24 tasks among all algorithms. Ablation studies verify that the reverse model imagination can effectively generate more conservative behaviors and achieve state-of-the-art performance on ofﬂine RL benchmark tasks.
Researcher Affiliation	Collaboration	Jianhao Wang 1, Wenzhe Li 1, Haozhe Jiang1, Guangxiang Zhu2 , Siyuan Li1, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Baidu Inc., China
Pseudocode	Yes	Algorithm 1 ROMI: Reverse Ofﬂine Model-based Imagination
Open Source Code	No	The paper states: "Videos of the experiments are available online3. 3https://sites.google.com/view/romi-offlinerl/". This link is explicitly for videos of experiments, not for the source code of the methodology itself.
Open Datasets	Yes	We conduct extensive evaluations on the D4RL ofﬂine benchmark suite [18]. Empirical results show that ROMI signiﬁcantly outperforms state-of-the-art model-free and model-based baselines.
Dataset Splits	Yes	We evaluate ROMI on a wide range of domains in the D4RL benchmark [18], including the Maze2D domain, the Gym-Mu Jo Co tasks, and the Ant Maze domain. D4RL benchmark [18] introduces three ﬂavors of datasets (i.e., ﬁxed, diverse, and play) in this setting, which commands the ant from different types of start locations to various types of goals.
Hardware Specification	No	The provided text does not contain any specific details about the hardware used for the experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies	No	The paper mentions using a 'conditional variational autoencoder (CVAE) [20, 8]' and 'model-free ofﬂine RL algorithm (e.g., BCQ [8] and CQL [11])' but does not specify version numbers for any software components.
Experiment Setup	Yes	Algorithm 1: ROMI: Reverse Ofﬂine Model-based Imagination. 1: Require: Ofﬂine dataset Denv, rollout horizon h, the number of iterations Cφ, Cθ, T, learning rates αφ, αθ, model-free ofﬂine RL algorithm (i.e., BCQ or CQL).