Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Authors: Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhi-Hong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms.
Researcher Affiliation Collaboration Chenjia Bai Harbin Institute of Technology Lingxiao Wang Northwestern University Zhuoran Yang Princeton University Zhihong Deng University of Technology Sydney Animesh Garg University of Toronto Vector Institute, NVIDIA Peng Liu Harbin Institute of Technology Zhaoran Wang Northwestern University
Pseudocode Yes Algorithm 1 PBRL algorithm
Open Source Code Yes The code is available at https://github.com/Baichenjia/PBRL.
Open Datasets Yes Our experiments on the D4RL benchmark (Fu et al., 2020) show that PBRL provides reasonable uncertainty quantification and yields better performance compared to the state-of-the-art algorithms. The dataset is released at http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2_ old/.
Dataset Splits No The paper uses the D4RL benchmark datasets, which are offline datasets used for training policies in offline RL. It does not describe explicit train/validation/test splits of these datasets in the traditional supervised learning sense (e.g., specific percentages or sample counts for validation).
Hardware Specification Yes We run experiments on a single A100 GPU.
Software Dependencies No The paper mentions several software implementations and libraries used (e.g., “SAC implementations”, “CQL”, “BEAR”, “UWAC”, “MOPO”, “TD3-BC”) but does not specify their version numbers.
Experiment Setup Yes Table 2: Hyper-parameters of PBRL, which includes: K 10, Q-network FC(256,256,256), βin 0.01, βood 5.0 0.2 (decaying strategy), τ 0.005, γ 0.99, lr of actor 1e-4, lr of critic 3e-4, Optimizer Adam, H 1M, Nood 10.