Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
Authors: Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL s fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https: //github.com/LAMDA-RL/PRDC. In this section, we will conduct extensive evaluations of the empirical performance of our method, PRDC. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies. |
| Pseudocode | Yes | Algorithm 1 PRDC |
| Open Source Code | Yes | Code is available at https: //github.com/LAMDA-RL/PRDC. |
| Open Datasets | Yes | On the Gym and Ant Maze tasks from D4RL (Fu et al., 2020), PRDC achieves state-of-the-art performance compared with previous methods (Section 5). We collect four datasets on lineworld. They are named lineworld-easy, lineworld-medium, lineworld-hard, and lineworld-superhard in order based on their difficulties. |
| Dataset Splits | No | The paper mentions training for a certain number of steps and seeds, but it does not explicitly provide specific percentages or counts for training, validation, and test splits. While D4RL datasets often have predefined splits, the paper does not detail them within its text. |
| Hardware Specification | Yes | We use the following hardware: NVIDIA RTX A4000, 12th Gen Intel(R) Core(TM) i9-12900K |
| Software Dependencies | Yes | We use the following software versions: Mu Jo Co 2.2.0 (Todorov et al., 2012) Gym 0.21.0 (Brockman et al., 2016) Mu Jo Co-py 2.1.2.14 Py Torch 1.12.1 (Paszke et al., 2019) |
| Experiment Setup | Yes | The full hyper-parameters setting is in Table 3. |