Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Authors: Steven Carr, Nils Jansen, Ufuk Topcu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.
Researcher Affiliation Academia Steven Carr1 , Nils Jansen2 and Ufuk Topcu1 1The University of Texas at Austin 2Radboud University, Nijmegen, The Netherlands
Pseudocode No The paper includes a procedural flow chart (Figure 2) but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'We provide a Python toolchain that employs the probabilistic model checker Storm [Dehnert et al., 2017].', but it does not include a specific link to the code for the methodology described in the paper, nor does it explicitly state that the code is being released publicly (e.g., via a repository or supplementary materials).
Open Datasets Yes We analyze the method on three different settings: Maze(c), Grid(c) and Navigation(c), for detailed descriptions of these examples see [Norman et al., 2017] and [Carr et al., 2019], respectively.
Dataset Splits No The paper mentions training and retraining RNNs but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) used in their experiments.
Hardware Specification Yes All experiments are run on a 2.3 GHz machine with a 12 GB memory limit and a max computation time of 105 seconds.
Software Dependencies No The paper mentions using 'Python', the 'probabilistic model checker Storm', and the 'deep learning library Keras' but does not provide specific version numbers for these software components.
Experiment Setup Yes For proper comparison with the synthesis tools we adopt the experiment setup of [Carr et al., 2019], whereby we always iterate for 10 instances of retraining the RNN from counterexample data. ... in this trivial example is given by H(Crit M A1) = p log2(p) (1 p) log2(1 p) = 0 from (2). The average entropy is below a prescribed threshold, η = 0.5, and thus we increase the number of memory nodes, which results in the satisfying FSC A2 in Fig. 3c.