Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
Authors: Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL s fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https: //github.com/LAMDA-RL/PRDC. In this section, we will conduct extensive evaluations of the empirical performance of our method, PRDC. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies. |
| Pseudocode | Yes | Algorithm 1 PRDC |
| Open Source Code | Yes | Code is available at https: //github.com/LAMDA-RL/PRDC. |
| Open Datasets | Yes | On the Gym and Ant Maze tasks from D4RL (Fu et al., 2020), PRDC achieves state-of-the-art performance compared with previous methods (Section 5). We collect four datasets on lineworld. They are named lineworld-easy, lineworld-medium, lineworld-hard, and lineworld-superhard in order based on their difficulties. |
| Dataset Splits | No | The paper mentions training for a certain number of steps and seeds, but it does not explicitly provide specific percentages or counts for training, validation, and test splits. While D4RL datasets often have predefined splits, the paper does not detail them within its text. |
| Hardware Specification | Yes | We use the following hardware: NVIDIA RTX A4000, 12th Gen Intel(R) Core(TM) i9-12900K |
| Software Dependencies | Yes | We use the following software versions: Mu Jo Co 2.2.0 (Todorov et al., 2012) Gym 0.21.0 (Brockman et al., 2016) Mu Jo Co-py 2.1.2.14 Py Torch 1.12.1 (Paszke et al., 2019) |
| Experiment Setup | Yes | The full hyper-parameters setting is in Table 3. |