reproducibilityindex.ai

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Authors: Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.
Researcher Affiliation	Collaboration	1Institute for Interdisciplinary Sciences, Tsinghua University, China 2Institute of Automation, Chinese Academy of Sciences, China 3Fuxi AI Lab, Netease, Inc., Hangzhou, China 4Department of Automation, Tsinghua University, China 5Washington University in St. Louis, USA.
Pseudocode	Yes	Algorithm 1 BOORL, Offline Phase. Algorithm 2 BOORL, Online Phase.
Open Source Code	Yes	Our code is public online at https://github.com/Yiqin Yang/BOORL.
Open Datasets	Yes	To answer the questions above, we conduct experiments to test our proposed approach on the D4RL benchmark (Fu et al., 2020), which encompasses a variety of dataset qualities and domains.
Dataset Splits	Yes	We first gradually increase the proportion of the offline data in the training dataset from 15% to 50% to validate the effect of the sudden change in the replay data distribution.
Hardware Specification	Yes	All experiments are conducted on the same experimental setup, a single Ge Force RTX 3090 GPU and an Intel Core i7-6700k CPU at 4.00GHz.
Software Dependencies	No	The paper mentions software used (e.g., TD3+BC, TD3) but does not provide specific version numbers for these or other relevant libraries/dependencies.
Experiment Setup	Yes	We outline the hyper-parameters used by BOORL in Table 7. Table 7 includes specific values such as 'Critic learning rate 3e-4', 'Mini-batch size 256', 'Discount factor 0.99', etc.