Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Planning in POMDPs with Self-Improving Simulators

Authors: Jinke He, Miguel Suau, Hendrik Baier, Michael Kaisers, Frans A. Oliehoek

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results in two large domains show that when integrated with POMCP, our approach allows to plan with improving efficiency over time. We perform the evaluation on two large POMDPs introduced by [He et al., 2020], the Grab A Chair (GAC) domain and the Grid Traffic Control (GTC) domain, of which descriptions can be found in Appendix C.1.
Researcher Affiliation	Academia	1Delft University of Technology, The Netherlands 2Centrum Wiskunde & Informatica, The Netherlands
Pseudocode	Yes	Algorithm 1 outlines our approach.
Open Source Code	No	The paper does not provide a link to source code for the described methodology. The provided arXiv link is for an extended version of the paper itself.
Open Datasets	Yes	We perform the evaluation on two large POMDPs introduced by [He et al., 2020], the Grab A Chair (GAC) domain and the Grid Traffic Control (GTC) domain, of which descriptions can be found in Appendix C.1.
Dataset Splits	No	The paper mentions training data and a test dataset, but does not specify validation sets or detailed splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions components like GRU and stochastic gradient descent, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In all planning experiments with self-improving simulators, we start with an IALS that makes use of a completely untrained ˆIθ, implemented by a GRU; after every real episode it is trained for 64 gradient steps with the accumulated data from the global simulations. The results are averaged over 2500 and 1000 individual runs for the GAC and GTC domains, respectively. ... allowing 1/64 and 1/16 seconds for each decision correspondingly... We fix the number of POMCP simulations to 100 per planning step.