Offline Reinforcement Learning with Value-based Episodic Memory
Authors: Xiaoteng Ma, Yiqin Yang, Hao Hu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang, Qihan Liu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical analysis for the convergence properties of our proposed VEM method, and empirical results in the D4RL benchmark show that our method achieves superior performance in most tasks, particularly in sparse-reward tasks. |
| Researcher Affiliation | Academia | Xiaoteng Ma1 , Yiqin Yang1 , Hao Hu2 , Qihan Liu1, Jun Yang1 , Chongjie Zhang2 , Qianchuan Zhao1, Bin Liang1 1Department of Automation, Tsinghua University 2Institute for Interdisciplinary Information Sciences, Tsinghua University |
| Pseudocode | Yes | A formal description for the VEM algorithm is shown in Algorithm 1 in Appendix A.1. |
| Open Source Code | Yes | Our code is public online at https://github.com/Yiqin Yang/VEM. |
| Open Datasets | Yes | Finally, we evaluate our method in the offline RL benchmark D4RL (Fu et al., 2020). We ran VEM on Ant Maze, Adroit, and Mu Jo Co environments to evaluate its performance on various types of tasks. |
| Dataset Splits | No | The paper mentions using the D4RL benchmark and various environments (Ant Maze, Adroit, Mu Jo Co) but does not explicitly describe the specific train/validation/test dataset splits used for reproduction, nor does it refer to predefined splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions general software components like "Adam" for the optimizer but does not provide specific version numbers for any software, libraries, or dependencies. |
| Experiment Setup | Yes | The hyper-parameters and network structure used in VEM are shown in Appendix C.3. Table 2: Hyper-parameter Sheet [lists Critic Learning Rate, Actor Learning Rate, Optimizer, Target Update Rate (κ), Memory Update Period, Batch Size, Discount Factor, Gradient Steps per Update, Maximum Length, Episode Length]. Table 3: Hyper-Parameter τ used in VEM across different tasks. |