Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks
Authors: Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes. We evaluate our RNN-based synthesis procedure on benchmark examples that are subject to either LTL specifications or expected cost specifications. |
| Researcher Affiliation | Collaboration | 1 The University of Texas at Austin 2 Radboud University, Nijmegen, The Netherlands 3 Albert-Ludwigs-Universit at Freiburg, Freiburg im Breisgau, Germany 4 Concept Engineering Gmb H, Freiburg im Breisgau, Germany |
| Pseudocode | No | The paper includes a flowchart (Figure 1) to illustrate the workflow, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper mentions the use of Keras and probabilistic model checkers like PRISM and Storm, but does not provide a link or explicit statement about the availability of the authors' source code for the described methodology. |
| Open Datasets | No | The paper describes generating data ("sample uniformly over all states of the MDP and generate finite paths... thereby creating multiple trajectory trees") or extending existing problem examples for benchmarks (e.g., Maze(c), Grid(c), Rock Sample), but does not provide concrete access information (link, DOI, formal citation with authors/year) for a publicly available or open dataset used in their experiments. |
| Dataset Splits | No | The paper describes creating a "training set D" from generated observation-action sequences but does not provide specific details on training, validation, and test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | The paper states: "We evaluated on a 2.3 GHz machine with a 12 GB memory limit and a specified maximum computation time of 105 seconds." This provides general speed and memory but no specific CPU/GPU models or detailed computer specifications. |
| Software Dependencies | No | The paper mentions using "Keras" and probabilistic model checkers "PRISM" and "STORM", but does not provide specific version numbers for these software components. |
| Experiment Setup | No | The paper mentions using the Adam optimizer with a cross-entropy error function, but it does not provide specific hyperparameter values such as learning rate, batch size, or number of epochs for the experimental setup. |