reproducibilityindex.ai

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Authors: Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on a set of benchmark datasets, we verify the effectiveness of our method in boosting ofﬂine RL performance over state-of-the-art methods. Furthermore, in the scenario of the multiplayer online battle arena (MOBA) game Honor of Kings, which involves large state-action space, our proposed method can also achieve excellent performance.
Researcher Affiliation	Collaboration	1Tencent AI Lab, Shenzhen, China 2The Pennsylvania State University, University Park, USA
Pseudocode	No	The paper describes the training process and various components but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using 'open-source codes provided in corresponding methods' for baselines but does not provide a specific link or explicit statement about the availability of their own code for the proposed AQL method.
Open Datasets	Yes	We evaluate ofﬂine RL algorithms by training on these ﬁxed datasets provided by openaccess benchmarking dataset D4RL [Fu et al., 2020] and evaluating the learned policies on the real environments.
Dataset Splits	No	The paper mentions 'samples collected during the evaluation process are only used for testing and not used for training' but does not provide specific numerical dataset split information (percentages or counts) for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies	No	The paper mentions using 'open-source codes provided in corresponding methods' for baselines, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions).
Experiment Setup	Yes	To keep the same parameter settings as BCQ and BEAR, we set K = 2 (number of candidate Q-functions), λ = 0.75 (minimum weighting factor), ϵ = 0.05 (policy constraint threshold), and B = 1000 (total training steps).