Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Authors: Yi Ma, Jianye Hao, Xiaohan Hu, YAN ZHENG, Chenjun Xiao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks, clearly demonstrate its superiority over behavior regularization.
Researcher Affiliation Collaboration Yi Ma1,2, Jianye Hao3,4 , Xiaohan Hu3, Yan Zheng3 Chenjun Xiao5 1School of Computer and Information Technology, Shanxi University, mayi@sxu.edu.cn 2Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education 3College of Intelligence and Computing, Tianjin University {jianye.hao, huxiaohan, yanzheng}@tju.edu.cn 4Noah s Ark Lab, Huawei 5The Chinese University of Hongkong, Shenzhen, chenjunx@cuhk.edu.cn
Pseudocode Yes We give the pseudocode of both CPI and CPI-RE in Algorithm 1.
Open Source Code Yes Codes are provided in this link https://github.com/mamengyiyi/CPI.
Open Datasets Yes Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks...
Dataset Splits No The paper states it uses D4RL benchmarks but does not explicitly provide details on training/validation/test dataset splits, such as percentages or specific sample counts for each split.
Hardware Specification Yes All experiments are run on a GeForce GTX 2080TI GPU.
Software Dependencies No The paper mentions using TD3+BC as a base for modifications and refers to a GitHub repository, but it does not specify explicit version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other ancillary software libraries.
Experiment Setup Yes Table 3: CPI Hyperparameters and Table 4: Regularization parameter τ and weighting factor λ of CPI for all datasets detail the experimental setup, including specific hyperparameter values.