reproducibilityindex.ai

Offline Reinforcement Learning with Value-based Episodic Memory

Authors: Xiaoteng Ma, Yiqin Yang, Hao Hu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang, Qihan Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical analysis for the convergence properties of our proposed VEM method, and empirical results in the D4RL benchmark show that our method achieves superior performance in most tasks, particularly in sparse-reward tasks.
Researcher Affiliation	Academia	Xiaoteng Ma1 , Yiqin Yang1 , Hao Hu2 , Qihan Liu1, Jun Yang1 , Chongjie Zhang2 , Qianchuan Zhao1, Bin Liang1 1Department of Automation, Tsinghua University 2Institute for Interdisciplinary Information Sciences, Tsinghua University
Pseudocode	Yes	A formal description for the VEM algorithm is shown in Algorithm 1 in Appendix A.1.
Open Source Code	Yes	Our code is public online at https://github.com/Yiqin Yang/VEM.
Open Datasets	Yes	Finally, we evaluate our method in the ofﬂine RL benchmark D4RL (Fu et al., 2020). We ran VEM on Ant Maze, Adroit, and Mu Jo Co environments to evaluate its performance on various types of tasks.
Dataset Splits	No	The paper mentions using the D4RL benchmark and various environments (Ant Maze, Adroit, Mu Jo Co) but does not explicitly describe the specific train/validation/test dataset splits used for reproduction, nor does it refer to predefined splits with percentages or counts.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions general software components like "Adam" for the optimizer but does not provide specific version numbers for any software, libraries, or dependencies.
Experiment Setup	Yes	The hyper-parameters and network structure used in VEM are shown in Appendix C.3. Table 2: Hyper-parameter Sheet [lists Critic Learning Rate, Actor Learning Rate, Optimizer, Target Update Rate (κ), Memory Update Period, Batch Size, Discount Factor, Gradient Steps per Update, Maximum Length, Episode Length]. Table 3: Hyper-Parameter τ used in VEM across different tasks.