reproducibilityindex.ai

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Authors: Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical evaluations of SQL and EQL in this section. We first evaluate SQL and EQL against other baseline algorithms on benchmark offline RL datasets.
Researcher Affiliation	Collaboration	1Institute for AI Industry Research (AIR), Tsinghua University 2Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University 3Yale University 4Northwestern University 5Shanghai Artificial Intelligence Laboratory *Work done while at JD Technology.
Pseudocode	Yes	We summarize the training procedure in Algorithm 1.
Open Source Code	Yes	Code is available at https://github.com/ryanxhr/IVR.
Open Datasets	Yes	We first evaluate our approach on D4RL datasets (Fu et al., 2020).
Dataset Splits	No	The paper refers to using D4RL datasets and performing evaluations, but it does not explicitly state specific training, validation, and test split percentages or sample counts for reproduction.
Hardware Specification	No	The paper mentions implementing the method in JAX but does not provide any specific GPU, CPU, or cloud hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'JAX', and 'd3rlpy (Seno & Imai, 2021)', but does not provide specific version numbers for JAX or d3rlpy, which are key software dependencies.
Experiment Setup	Yes	In SQL and EQL, we use 2-layer MLP with 256 hidden units, we use Adam optimizer (Kingma & Ba, 2015) with a learning rate of 2 10 4 for all neural networks. Following Mnih et al. (2013); Lillicrap et al. (2016), we introduce a target critic network with soft update weight 5 10 3. [...] The only hyperparameter α used in SQL and EQL is listed in Table 5.