Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Authors: Kaiyang Guo, Shao Yunfeng, Yanhui Geng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks. and Section 5 Experiments which discusses D4RL benchmark, performance comparison, and learning curves.
Researcher Affiliation Industry Kaiyang Guo Yunfeng Shao Yanhui Geng Huawei Noah s Ark Lab
Pseudocode Yes The complete algorithm is listed in Appendix D.
Open Source Code Yes The code is available online3 Code is released at https://github.com/huawei-noah/HEBO/tree/master/PMDB and https:// gitee.com/mindspore/models/tree/master/research/rl/pmdb.
Open Datasets Yes We consider the Gym domains in the D4RL benchmark [42] to answer these questions.
Dataset Splits No The paper mentions using 'D4RL benchmark' and 'static dataset D = {(s, a, r, s ')}' but does not explicitly provide specific percentages, sample counts, or a detailed methodology for splitting the dataset into training, validation, and test sets.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions 'MindSpore' in the code repository link (gitee.com/mindspore/models/tree/master/research/rl/pmdb) but does not provide specific version numbers for MindSpore or any other software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup Yes Especially, the hyperparameters in sampling procedure are N = 10 and k = 2. and Table 2 lists the impact of k. In each setting, we evaluate the learned policy in both the true MDP and the AMG.