reproducibilityindex.ai

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Authors: Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhi-Hong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms.
Researcher Affiliation	Collaboration	Chenjia Bai Harbin Institute of Technology Lingxiao Wang Northwestern University Zhuoran Yang Princeton University Zhihong Deng University of Technology Sydney Animesh Garg University of Toronto Vector Institute, NVIDIA Peng Liu Harbin Institute of Technology Zhaoran Wang Northwestern University
Pseudocode	Yes	Algorithm 1 PBRL algorithm
Open Source Code	Yes	The code is available at https://github.com/Baichenjia/PBRL.
Open Datasets	Yes	Our experiments on the D4RL benchmark (Fu et al., 2020) show that PBRL provides reasonable uncertainty quantiﬁcation and yields better performance compared to the state-of-the-art algorithms. The dataset is released at http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2_ old/.
Dataset Splits	No	The paper uses the D4RL benchmark datasets, which are offline datasets used for training policies in offline RL. It does not describe explicit train/validation/test splits of these datasets in the traditional supervised learning sense (e.g., specific percentages or sample counts for validation).
Hardware Specification	Yes	We run experiments on a single A100 GPU.
Software Dependencies	No	The paper mentions several software implementations and libraries used (e.g., “SAC implementations”, “CQL”, “BEAR”, “UWAC”, “MOPO”, “TD3-BC”) but does not specify their version numbers.
Experiment Setup	Yes	Table 2: Hyper-parameters of PBRL, which includes: K 10, Q-network FC(256,256,256), βin 0.01, βood 5.0 0.2 (decaying strategy), τ 0.005, γ 0.99, lr of actor 1e-4, lr of critic 3e-4, Optimizer Adam, H 1M, Nood 10.