reproducibilityindex.ai

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

Authors: Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL s fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https: //github.com/LAMDA-RL/PRDC. In this section, we will conduct extensive evaluations of the empirical performance of our method, PRDC.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies.
Pseudocode	Yes	Algorithm 1 PRDC
Open Source Code	Yes	Code is available at https: //github.com/LAMDA-RL/PRDC.
Open Datasets	Yes	On the Gym and Ant Maze tasks from D4RL (Fu et al., 2020), PRDC achieves state-of-the-art performance compared with previous methods (Section 5). We collect four datasets on lineworld. They are named lineworld-easy, lineworld-medium, lineworld-hard, and lineworld-superhard in order based on their difficulties.
Dataset Splits	No	The paper mentions training for a certain number of steps and seeds, but it does not explicitly provide specific percentages or counts for training, validation, and test splits. While D4RL datasets often have predefined splits, the paper does not detail them within its text.
Hardware Specification	Yes	We use the following hardware: NVIDIA RTX A4000, 12th Gen Intel(R) Core(TM) i9-12900K
Software Dependencies	Yes	We use the following software versions: Mu Jo Co 2.2.0 (Todorov et al., 2012) Gym 0.21.0 (Brockman et al., 2016) Mu Jo Co-py 2.1.2.14 Py Torch 1.12.1 (Paszke et al., 2019)
Experiment Setup	Yes	The full hyper-parameters setting is in Table 3.