Iteratively Refined Behavior Regularization for Offline Reinforcement Learning
Authors: Yi Ma, Jianye Hao, Xiaohan Hu, YAN ZHENG, Chenjun Xiao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks, clearly demonstrate its superiority over behavior regularization. |
| Researcher Affiliation | Collaboration | Yi Ma1,2, Jianye Hao3,4 , Xiaohan Hu3, Yan Zheng3 Chenjun Xiao5 1School of Computer and Information Technology, Shanxi University, mayi@sxu.edu.cn 2Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education 3College of Intelligence and Computing, Tianjin University {jianye.hao, huxiaohan, yanzheng}@tju.edu.cn 4Noah s Ark Lab, Huawei 5The Chinese University of Hongkong, Shenzhen, chenjunx@cuhk.edu.cn |
| Pseudocode | Yes | We give the pseudocode of both CPI and CPI-RE in Algorithm 1. |
| Open Source Code | Yes | Codes are provided in this link https://github.com/mamengyiyi/CPI. |
| Open Datasets | Yes | Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks... |
| Dataset Splits | No | The paper states it uses D4RL benchmarks but does not explicitly provide details on training/validation/test dataset splits, such as percentages or specific sample counts for each split. |
| Hardware Specification | Yes | All experiments are run on a GeForce GTX 2080TI GPU. |
| Software Dependencies | No | The paper mentions using TD3+BC as a base for modifications and refers to a GitHub repository, but it does not specify explicit version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other ancillary software libraries. |
| Experiment Setup | Yes | Table 3: CPI Hyperparameters and Table 4: Regularization parameter τ and weighting factor λ of CPI for all datasets detail the experimental setup, including specific hyperparameter values. |