Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Weighted model estimation for offline model-based reinforcement learning
Authors: Toru Hishinuma, Kei Senda
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate the effectiveness of weighting with the artificial weight. 6 Numerical Experiment |
| Researcher Affiliation | Academia | Toru Hishinuma Kyoto University EMAIL Kei Senda Kyoto University EMAIL |
| Pseudocode | Yes | Algorithm 1 Weighted model estimation for policy evaluation (full version). |
| Open Source Code | No | The paper mentions modifying existing code ('This paper implements SAC by modifying the implementation code by [36]') but does not explicitly state that the source code for their own methodology is made publicly available or provide a link to it. |
| Open Datasets | Yes | This paper studies policy optimization on the D4RL Benchmark [33] based on the Mu Jo Co simulator [34]. |
| Dataset Splits | No | The paper mentions using the D4RL Benchmark datasets but does not explicitly provide specific training/validation/test dataset splits, such as percentages or sample counts, within its text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or any other computer specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using PyTorch (implicitly via a reference to a PyTorch SAC implementation) and the MuJoCo simulator, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The agent uses Pθ represented by two-layer neural networks with 8 units with tanh activation. |