Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes
Authors: Steven Carr, Nils Jansen, Ufuk Topcu
JAIR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the RNN-based synthesis on benchmark examples that are subject to either LTL specifications or expected reward specifications. For the former, we compare to the tool PRISM-POMDP (Norman et al., 2017), and for the latter to PRISM-POMDP and the point-based solver Solve POMDP (Walraven & Spaan, 2017). ... In Tables 3 and 4, TO/MO denote violations of the time/memory limit, respectively and Res. refers to the output value of the induced DTMC. |
| Researcher Affiliation | Academia | Steven Carr EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA Nils Jansen EMAIL Faculty of Science, University of Nijmegen, Postbus 9010, Nijmegen 6500GL, The Netherlands Ufuk Topcu EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA |
| Pseudocode | No | The paper describes processes with flowcharts (Figures 1, 3, 5, 6, 7, 8) and mathematical definitions, but it does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps in a code-like format. |
| Open Source Code | No | The paper mentions using third-party tools like Keras, PRISM, and STORM, but it does not provide any statement, specific link, or reference indicating that the authors' own implementation code for the described methodology is publicly available or released. |
| Open Datasets | Yes | Additionally, we compare against the parametric benchmark Rock Sample[c, m]: 1. ... For further details on the action set, the observation and reward functions see (Smith & Simmons, 2004). |
| Dataset Splits | No | The paper discusses generating sequences of data for training the RNN-based policy and provides metrics like the size of the initial training data set, but it does not specify explicit training/validation/test splits with percentages, sample counts, or references to predefined splits for any dataset used. |
| Hardware Specification | No | We evaluated on a 2.3 GHz machine with a 12 GB memory limit and a specified maximum computation time of 105 seconds. This description is too general and lacks specific CPU models, GPU models, or detailed computer specifications required for hardware reproducibility. |
| Software Dependencies | No | First, we use the deep learning library Keras (Ketkar, 2017) to train the RNN-based policy from sequences of data. To evaluate policies, we employ the probabilistic model checkers PRISM (Norman et al., 2017) and STORM (Dehnert et al., 2017) for LTL and undiscounted expected reward respectively. No specific version numbers are provided for Keras, PRISM, or STORM. |
| Experiment Setup | No | To fit the RNN model to the sequences of training data, we use the Adam optimizer (Kingma & Ba, 2015) with a categorical cross-entropy error function (Goodfellow et al., 2016). While the architecture (three-layer LSTM, softmax) and optimization algorithm (Adam) are mentioned, specific hyperparameter values such as learning rate, batch size, or number of epochs are not provided in the main text. |