Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
Authors: Kaiyang Guo, Shao Yunfeng, Yanhui Geng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks. and Section 5 Experiments which discusses D4RL benchmark, performance comparison, and learning curves. |
| Researcher Affiliation | Industry | Kaiyang Guo Yunfeng Shao Yanhui Geng Huawei Noah s Ark Lab |
| Pseudocode | Yes | The complete algorithm is listed in Appendix D. |
| Open Source Code | Yes | The code is available online3 Code is released at https://github.com/huawei-noah/HEBO/tree/master/PMDB and https:// gitee.com/mindspore/models/tree/master/research/rl/pmdb. |
| Open Datasets | Yes | We consider the Gym domains in the D4RL benchmark [42] to answer these questions. |
| Dataset Splits | No | The paper mentions using 'D4RL benchmark' and 'static dataset D = {(s, a, r, s ')}' but does not explicitly provide specific percentages, sample counts, or a detailed methodology for splitting the dataset into training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'MindSpore' in the code repository link (gitee.com/mindspore/models/tree/master/research/rl/pmdb) but does not provide specific version numbers for MindSpore or any other software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | Yes | Especially, the hyperparameters in sampling procedure are N = 10 and k = 2. and Table 2 lists the impact of k. In each setting, we evaluate the learned policy in both the true MDP and the AMG. |