Boosting Offline Reinforcement Learning with Residual Generative Modeling
Authors: Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on a set of benchmark datasets, we verify the effectiveness of our method in boosting offline RL performance over state-of-the-art methods. Furthermore, in the scenario of the multiplayer online battle arena (MOBA) game Honor of Kings, which involves large state-action space, our proposed method can also achieve excellent performance. |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab, Shenzhen, China 2The Pennsylvania State University, University Park, USA |
| Pseudocode | No | The paper describes the training process and various components but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'open-source codes provided in corresponding methods' for baselines but does not provide a specific link or explicit statement about the availability of their own code for the proposed AQL method. |
| Open Datasets | Yes | We evaluate offline RL algorithms by training on these fixed datasets provided by openaccess benchmarking dataset D4RL [Fu et al., 2020] and evaluating the learned policies on the real environments. |
| Dataset Splits | No | The paper mentions 'samples collected during the evaluation process are only used for testing and not used for training' but does not provide specific numerical dataset split information (percentages or counts) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'open-source codes provided in corresponding methods' for baselines, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions). |
| Experiment Setup | Yes | To keep the same parameter settings as BCQ and BEAR, we set K = 2 (number of candidate Q-functions), λ = 0.75 (minimum weighting factor), ϵ = 0.05 (policy constraint threshold), and B = 1000 (total training steps). |