Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

Authors: Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL s fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https: //github.com/LAMDA-RL/PRDC. In this section, we will conduct extensive evaluations of the empirical performance of our method, PRDC.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies.
Pseudocode Yes Algorithm 1 PRDC
Open Source Code Yes Code is available at https: //github.com/LAMDA-RL/PRDC.
Open Datasets Yes On the Gym and Ant Maze tasks from D4RL (Fu et al., 2020), PRDC achieves state-of-the-art performance compared with previous methods (Section 5). We collect four datasets on lineworld. They are named lineworld-easy, lineworld-medium, lineworld-hard, and lineworld-superhard in order based on their difficulties.
Dataset Splits No The paper mentions training for a certain number of steps and seeds, but it does not explicitly provide specific percentages or counts for training, validation, and test splits. While D4RL datasets often have predefined splits, the paper does not detail them within its text.
Hardware Specification Yes We use the following hardware: NVIDIA RTX A4000, 12th Gen Intel(R) Core(TM) i9-12900K
Software Dependencies Yes We use the following software versions: Mu Jo Co 2.2.0 (Todorov et al., 2012) Gym 0.21.0 (Brockman et al., 2016) Mu Jo Co-py 2.1.2.14 Py Torch 1.12.1 (Paszke et al., 2019)
Experiment Setup Yes The full hyper-parameters setting is in Table 3.