Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Authors: Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.
Researcher Affiliation Collaboration 1Institute for Interdisciplinary Sciences, Tsinghua University, China 2Institute of Automation, Chinese Academy of Sciences, China 3Fuxi AI Lab, Netease, Inc., Hangzhou, China 4Department of Automation, Tsinghua University, China 5Washington University in St. Louis, USA.
Pseudocode Yes Algorithm 1 BOORL, Offline Phase. Algorithm 2 BOORL, Online Phase.
Open Source Code Yes Our code is public online at https://github.com/Yiqin Yang/BOORL.
Open Datasets Yes To answer the questions above, we conduct experiments to test our proposed approach on the D4RL benchmark (Fu et al., 2020), which encompasses a variety of dataset qualities and domains.
Dataset Splits Yes We first gradually increase the proportion of the offline data in the training dataset from 15% to 50% to validate the effect of the sudden change in the replay data distribution.
Hardware Specification Yes All experiments are conducted on the same experimental setup, a single Ge Force RTX 3090 GPU and an Intel Core i7-6700k CPU at 4.00GHz.
Software Dependencies No The paper mentions software used (e.g., TD3+BC, TD3) but does not provide specific version numbers for these or other relevant libraries/dependencies.
Experiment Setup Yes We outline the hyper-parameters used by BOORL in Table 7. Table 7 includes specific values such as 'Critic learning rate 3e-4', 'Mini-batch size 256', 'Discount factor 0.99', etc.