Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

Authors: Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes. We evaluate our RNN-based synthesis procedure on benchmark examples that are subject to either LTL specifications or expected cost specifications.
Researcher Affiliation Collaboration 1 The University of Texas at Austin 2 Radboud University, Nijmegen, The Netherlands 3 Albert-Ludwigs-Universit at Freiburg, Freiburg im Breisgau, Germany 4 Concept Engineering Gmb H, Freiburg im Breisgau, Germany
Pseudocode No The paper includes a flowchart (Figure 1) to illustrate the workflow, but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper mentions the use of Keras and probabilistic model checkers like PRISM and Storm, but does not provide a link or explicit statement about the availability of the authors' source code for the described methodology.
Open Datasets No The paper describes generating data ("sample uniformly over all states of the MDP and generate finite paths... thereby creating multiple trajectory trees") or extending existing problem examples for benchmarks (e.g., Maze(c), Grid(c), Rock Sample), but does not provide concrete access information (link, DOI, formal citation with authors/year) for a publicly available or open dataset used in their experiments.
Dataset Splits No The paper describes creating a "training set D" from generated observation-action sequences but does not provide specific details on training, validation, and test dataset splits, percentages, or sample counts.
Hardware Specification No The paper states: "We evaluated on a 2.3 GHz machine with a 12 GB memory limit and a specified maximum computation time of 105 seconds." This provides general speed and memory but no specific CPU/GPU models or detailed computer specifications.
Software Dependencies No The paper mentions using "Keras" and probabilistic model checkers "PRISM" and "STORM", but does not provide specific version numbers for these software components.
Experiment Setup No The paper mentions using the Adam optimizer with a cross-entropy error function, but it does not provide specific hyperparameter values such as learning rate, batch size, or number of epochs for the experimental setup.