Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Authors: Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical evaluations of SQL and EQL in this section. We first evaluate SQL and EQL against other baseline algorithms on benchmark offline RL datasets. |
| Researcher Affiliation | Collaboration | 1Institute for AI Industry Research (AIR), Tsinghua University 2Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University 3Yale University 4Northwestern University 5Shanghai Artificial Intelligence Laboratory *Work done while at JD Technology. |
| Pseudocode | Yes | We summarize the training procedure in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/ryanxhr/IVR. |
| Open Datasets | Yes | We first evaluate our approach on D4RL datasets (Fu et al., 2020). |
| Dataset Splits | No | The paper refers to using D4RL datasets and performing evaluations, but it does not explicitly state specific training, validation, and test split percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper mentions implementing the method in JAX but does not provide any specific GPU, CPU, or cloud hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'JAX', and 'd3rlpy (Seno & Imai, 2021)', but does not provide specific version numbers for JAX or d3rlpy, which are key software dependencies. |
| Experiment Setup | Yes | In SQL and EQL, we use 2-layer MLP with 256 hidden units, we use Adam optimizer (Kingma & Ba, 2015) with a learning rate of 2 10 4 for all neural networks. Following Mnih et al. (2013); Lillicrap et al. (2016), we introduce a target critic network with soft update weight 5 10 3. [...] The only hyperparameter α used in SQL and EQL is listed in Table 5. |