Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
Authors: Kaiyang Guo, Shao Yunfeng, Yanhui Geng
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks. and Section 5 Experiments which discusses D4RL benchmark, performance comparison, and learning curves. |
| Researcher Affiliation | Industry | Kaiyang Guo Yunfeng Shao Yanhui Geng Huawei Noah s Ark Lab |
| Pseudocode | Yes | The complete algorithm is listed in Appendix D. |
| Open Source Code | Yes | The code is available online3 Code is released at https://github.com/huawei-noah/HEBO/tree/master/PMDB and https:// gitee.com/mindspore/models/tree/master/research/rl/pmdb. |
| Open Datasets | Yes | We consider the Gym domains in the D4RL benchmark [42] to answer these questions. |
| Dataset Splits | No | The paper mentions using 'D4RL benchmark' and 'static dataset D = {(s, a, r, s ')}' but does not explicitly provide specific percentages, sample counts, or a detailed methodology for splitting the dataset into training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'MindSpore' in the code repository link (gitee.com/mindspore/models/tree/master/research/rl/pmdb) but does not provide specific version numbers for MindSpore or any other software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | Yes | Especially, the hyperparameters in sampling procedure are N = 10 and k = 2. and Table 2 lists the impact of k. In each setting, we evaluate the learned policy in both the true MDP and the AMG. |