Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning
Authors: Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhi-Hong Deng, Animesh Garg, Peng Liu, Zhaoran Wang
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms. |
| Researcher Affiliation | Collaboration | Chenjia Bai Harbin Institute of Technology Lingxiao Wang Northwestern University Zhuoran Yang Princeton University Zhihong Deng University of Technology Sydney Animesh Garg University of Toronto Vector Institute, NVIDIA Peng Liu Harbin Institute of Technology Zhaoran Wang Northwestern University |
| Pseudocode | Yes | Algorithm 1 PBRL algorithm |
| Open Source Code | Yes | The code is available at https://github.com/Baichenjia/PBRL. |
| Open Datasets | Yes | Our experiments on the D4RL benchmark (Fu et al., 2020) show that PBRL provides reasonable uncertainty quantification and yields better performance compared to the state-of-the-art algorithms. The dataset is released at http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2_ old/. |
| Dataset Splits | No | The paper uses the D4RL benchmark datasets, which are offline datasets used for training policies in offline RL. It does not describe explicit train/validation/test splits of these datasets in the traditional supervised learning sense (e.g., specific percentages or sample counts for validation). |
| Hardware Specification | Yes | We run experiments on a single A100 GPU. |
| Software Dependencies | No | The paper mentions several software implementations and libraries used (e.g., “SAC implementations”, “CQL”, “BEAR”, “UWAC”, “MOPO”, “TD3-BC”) but does not specify their version numbers. |
| Experiment Setup | Yes | Table 2: Hyper-parameters of PBRL, which includes: K 10, Q-network FC(256,256,256), βin 0.01, βood 5.0 0.2 (decaying strategy), τ 0.005, γ 0.99, lr of actor 1e-4, lr of critic 3e-4, Optimizer Adam, H 1M, Nood 10. |