reproducibilityindex.ai

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Authors: Kaiyang Guo, Shao Yunfeng, Yanhui Geng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks. and Section 5 Experiments which discusses D4RL benchmark, performance comparison, and learning curves.
Researcher Affiliation	Industry	Kaiyang Guo Yunfeng Shao Yanhui Geng Huawei Noah s Ark Lab
Pseudocode	Yes	The complete algorithm is listed in Appendix D.
Open Source Code	Yes	The code is available online3 Code is released at https://github.com/huawei-noah/HEBO/tree/master/PMDB and https:// gitee.com/mindspore/models/tree/master/research/rl/pmdb.
Open Datasets	Yes	We consider the Gym domains in the D4RL benchmark [42] to answer these questions.
Dataset Splits	No	The paper mentions using 'D4RL benchmark' and 'static dataset D = {(s, a, r, s ')}' but does not explicitly provide specific percentages, sample counts, or a detailed methodology for splitting the dataset into training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions 'MindSpore' in the code repository link (gitee.com/mindspore/models/tree/master/research/rl/pmdb) but does not provide specific version numbers for MindSpore or any other software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup	Yes	Especially, the hyperparameters in sampling procedure are N = 10 and k = 2. and Table 2 lists the impact of k. In each setting, we evaluate the learned policy in both the true MDP and the AMG.