Offline Reinforcement Learning with Value-based Episodic Memory

Authors: Xiaoteng Ma, Yiqin Yang, Hao Hu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang, Qihan Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical analysis for the convergence properties of our proposed VEM method, and empirical results in the D4RL benchmark show that our method achieves superior performance in most tasks, particularly in sparse-reward tasks.
Researcher Affiliation Academia Xiaoteng Ma1 , Yiqin Yang1 , Hao Hu2 , Qihan Liu1, Jun Yang1 , Chongjie Zhang2 , Qianchuan Zhao1, Bin Liang1 1Department of Automation, Tsinghua University 2Institute for Interdisciplinary Information Sciences, Tsinghua University
Pseudocode Yes A formal description for the VEM algorithm is shown in Algorithm 1 in Appendix A.1.
Open Source Code Yes Our code is public online at https://github.com/Yiqin Yang/VEM.
Open Datasets Yes Finally, we evaluate our method in the offline RL benchmark D4RL (Fu et al., 2020). We ran VEM on Ant Maze, Adroit, and Mu Jo Co environments to evaluate its performance on various types of tasks.
Dataset Splits No The paper mentions using the D4RL benchmark and various environments (Ant Maze, Adroit, Mu Jo Co) but does not explicitly describe the specific train/validation/test dataset splits used for reproduction, nor does it refer to predefined splits with percentages or counts.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions general software components like "Adam" for the optimizer but does not provide specific version numbers for any software, libraries, or dependencies.
Experiment Setup Yes The hyper-parameters and network structure used in VEM are shown in Appendix C.3. Table 2: Hyper-parameter Sheet [lists Critic Learning Rate, Actor Learning Rate, Optimizer, Target Update Rate (κ), Memory Update Period, Batch Size, Discount Factor, Gradient Steps per Update, Maximum Length, Episode Length]. Table 3: Hyper-Parameter τ used in VEM across different tasks.