Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability

Authors: Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, Nils Jansen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that we can compute policies for standard POMDP benchmarks extended to the multi-environment setting. ...We study ME-POMDPs and devise algorithms to compute robust policies against any adversarial choice of POMDP in the ME-POMDP. ...6 Experimental Evaluation The implementation of the LPs (3) and (4) along with AB-HSVI (Algorithm 1) forms a solution method for ME-POMDPs, and we answer the following research questions regarding this method. (Q1) Scalability: What is the computational cost of solving AB-POMDPs? (Q2) Baseline comparison: What is the added difficulty of robustness against adversarial beliefs compared to a naive baseline of solving individual POMDPs? (Q3) Model formulation: Does the model type, i.e., whether the problem is formulated as a ME-POMDP, PO-MEMDP, MO-POMDP or AB-POMDP, influence the performance? As no benchmarks exist for ME-POMDPs, we introduce two benchmarks for our experimental evaluation. ...Tables 1 and 2 show the results of running AB-HSVI on the Bird problem and Rock Sample.
Researcher Affiliation	Collaboration	Eline M. Bovy Radboud University Nijmegen, The Netherlands EMAIL Caleb Probine The University of Texas at Austin Austin, TX, USA EMAIL Marnix Suilen University of Antwerp Flanders Make Antwerp, Belgium EMAIL Ufuk Topcu The University of Texas at Austin Austin, TX, USA EMAIL Nils Jansen Ruhr-University Bochum & Radboud University Bochum, Germany & Nijmegen, The Netherlands EMAIL
Pseudocode	Yes	Algorithm 1 AB-HSVI Algorithm 2 Extracting policies from α-vectors. Algorithm 3 Extracting policies from α-vectors with pruning.
Open Source Code	Yes	All code is available at [6]. [6] Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, and Nils Jansen. Code for the AB-HSVI algorithm and the experiments in the paper: "Multi-environment POMDPs: Discrete model uncertainty under partial observability" (Neur IPS 2025), 2025. URL https://doi. org/10.5281/zenodo.17425571.
Open Datasets	Yes	As no benchmarks exist for ME-POMDPs, we introduce two benchmarks for our experimental evaluation. The first benchmark is based on the endangered bird preservation case study presented in Appendix B, which we shall refer to as the Bird problem. ...For the second benchmark, we extend Rock Sample [40] to ME-POMDPs. ...All code is available at [6]. [6] Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, and Nils Jansen. Code for the AB-HSVI algorithm and the experiments in the paper: "Multi-environment POMDPs: Discrete model uncertainty under partial observability" (Neur IPS 2025), 2025. URL https://doi. org/10.5281/zenodo.17425571.
Dataset Splits	No	The paper introduces new benchmarks (
Hardware Specification	Yes	We run experiments on a computer with an Intel Core i9-10980XE 3.00GHz processor and 256GB of RAM.
Software Dependencies	Yes	We use Gurobi [18] to solve LPs. [18] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2024. URL https://www.gurobi.com.
Experiment Setup	Yes	We set a time limit tl of 3600 seconds, discount factor γ = 0.95, and set HSVI s gap threshold to ϵ = 0.1 Rmin where Rmin is the minimum problem reward.