Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Authors: Steven Carr, Nils Jansen, Ufuk Topcu

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the RNN-based synthesis on benchmark examples that are subject to either LTL specifications or expected reward specifications. For the former, we compare to the tool PRISM-POMDP (Norman et al., 2017), and for the latter to PRISM-POMDP and the point-based solver Solve POMDP (Walraven & Spaan, 2017). ... In Tables 3 and 4, TO/MO denote violations of the time/memory limit, respectively and Res. refers to the output value of the induced DTMC.
Researcher Affiliation Academia Steven Carr EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA Nils Jansen EMAIL Faculty of Science, University of Nijmegen, Postbus 9010, Nijmegen 6500GL, The Netherlands Ufuk Topcu EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA
Pseudocode No The paper describes processes with flowcharts (Figures 1, 3, 5, 6, 7, 8) and mathematical definitions, but it does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps in a code-like format.
Open Source Code No The paper mentions using third-party tools like Keras, PRISM, and STORM, but it does not provide any statement, specific link, or reference indicating that the authors' own implementation code for the described methodology is publicly available or released.
Open Datasets Yes Additionally, we compare against the parametric benchmark Rock Sample[c, m]: 1. ... For further details on the action set, the observation and reward functions see (Smith & Simmons, 2004).
Dataset Splits No The paper discusses generating sequences of data for training the RNN-based policy and provides metrics like the size of the initial training data set, but it does not specify explicit training/validation/test splits with percentages, sample counts, or references to predefined splits for any dataset used.
Hardware Specification No We evaluated on a 2.3 GHz machine with a 12 GB memory limit and a specified maximum computation time of 105 seconds. This description is too general and lacks specific CPU models, GPU models, or detailed computer specifications required for hardware reproducibility.
Software Dependencies No First, we use the deep learning library Keras (Ketkar, 2017) to train the RNN-based policy from sequences of data. To evaluate policies, we employ the probabilistic model checkers PRISM (Norman et al., 2017) and STORM (Dehnert et al., 2017) for LTL and undiscounted expected reward respectively. No specific version numbers are provided for Keras, PRISM, or STORM.
Experiment Setup No To fit the RNN model to the sequences of training data, we use the Adam optimizer (Kingma & Ba, 2015) with a categorical cross-entropy error function (Goodfellow et al., 2016). While the architecture (three-layer LSTM, softmax) and optimization algorithm (Adam) are mentioned, specific hyperparameter values such as learning rate, batch size, or number of epochs are not provided in the main text.