Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Evaluating Policies for Robust POMDPs
Authors: Merlijn Krale, Eline M. Bovy, Maris F. L. Galesloot, Thiago Simão, Nils Jansen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation shows that (1) our proposed benchmarks cannot be solved by assuming naive nature policies, (2) our method of evaluating policies is accurate, and (3) the upper bounds provide solid baselines for evaluation. |
| Researcher Affiliation | Academia | Merlijn Krale Radboud University Nijmegen, The Netherlands EMAIL M. Bovy Radboud University Nijmegen, The Netherlands EMAIL F. L. Galesloot Radboud University Nijmegen, The Netherlands EMAIL D. Simão Eindhoven University of Technology Eindhoven, The Netherlands EMAIL Jansen Ruhr-University Bochum & Radboud University Bochum, Germany & Nijmegen, The Netherlands EMAIL |
| Pseudocode | No | The paper describes algorithms and modifications to existing methods (e.g., RHSVI) but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implement all methods in a Julia framework (based on POMDPs.jl [12]) to facilitate future research; available on Zenodo [29]. |
| Open Datasets | Yes | Secondly, we lift several POMDPs from the literature into RPOMDPs: TIGER [6], MINIHALLWAY [33], and ALOHA [24], as well as an expanded variant of HEAVENORHELL [4] (also used in [45]). |
| Dataset Splits | No | The paper uses simulated environments and benchmarks rather than static datasets with explicit training, validation, or test splits. The concept of dataset splits is not applicable in this context. |
| Hardware Specification | Yes | All experiments were conducted in Julia (version 1.11.5) on the same Ubuntu machine (version 22.04.5 LTS), which has an Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz and 256GB RAM (8 x 32GB DDR4-3200). |
| Software Dependencies | Yes | We implement our evaluation method in the Julia programming language, using a variant of the POMDPs.jl framework [12] for RPOMDPs with interval uncertainty sets... All experiments were conducted in Julia (version 1.11.5)... |
| Experiment Setup | Yes | We use discount factor γ = 1 for TOY , of γ = 0.99 for ECHO en HEAVENORHELL, and of γ = 0.95 for all other environments. ... For evaluation, we run MCTS five times and report the lowest value... |