Boosting Offline Reinforcement Learning with Residual Generative Modeling

Authors: Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on a set of benchmark datasets, we verify the effectiveness of our method in boosting offline RL performance over state-of-the-art methods. Furthermore, in the scenario of the multiplayer online battle arena (MOBA) game Honor of Kings, which involves large state-action space, our proposed method can also achieve excellent performance.
Researcher Affiliation Collaboration 1Tencent AI Lab, Shenzhen, China 2The Pennsylvania State University, University Park, USA
Pseudocode No The paper describes the training process and various components but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'open-source codes provided in corresponding methods' for baselines but does not provide a specific link or explicit statement about the availability of their own code for the proposed AQL method.
Open Datasets Yes We evaluate offline RL algorithms by training on these fixed datasets provided by openaccess benchmarking dataset D4RL [Fu et al., 2020] and evaluating the learned policies on the real environments.
Dataset Splits No The paper mentions 'samples collected during the evaluation process are only used for testing and not used for training' but does not provide specific numerical dataset split information (percentages or counts) for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies No The paper mentions using 'open-source codes provided in corresponding methods' for baselines, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions).
Experiment Setup Yes To keep the same parameter settings as BCQ and BEAR, we set K = 2 (number of candidate Q-functions), λ = 0.75 (minimum weighting factor), ϵ = 0.05 (policy constraint threshold), and B = 1000 (total training steps).