Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Authors: Steven Carr, Nils Jansen, Ufuk Topcu

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the RNN-based synthesis on benchmark examples that are subject to either LTL speciﬁcations or expected reward speciﬁcations. For the former, we compare to the tool PRISM-POMDP (Norman et al., 2017), and for the latter to PRISM-POMDP and the point-based solver Solve POMDP (Walraven & Spaan, 2017). ... In Tables 3 and 4, TO/MO denote violations of the time/memory limit, respectively and Res. refers to the output value of the induced DTMC.
Researcher Affiliation	Academia	Steven Carr EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA Nils Jansen EMAIL Faculty of Science, University of Nijmegen, Postbus 9010, Nijmegen 6500GL, The Netherlands Ufuk Topcu EMAIL The University of Texas at Austin, 2617 Wichita Street, C0600, Austin, Texas 78712-1221, USA
Pseudocode	No	The paper describes processes with flowcharts (Figures 1, 3, 5, 6, 7, 8) and mathematical definitions, but it does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps in a code-like format.
Open Source Code	No	The paper mentions using third-party tools like Keras, PRISM, and STORM, but it does not provide any statement, specific link, or reference indicating that the authors' own implementation code for the described methodology is publicly available or released.
Open Datasets	Yes	Additionally, we compare against the parametric benchmark Rock Sample[c, m]: 1. ... For further details on the action set, the observation and reward functions see (Smith & Simmons, 2004).
Dataset Splits	No	The paper discusses generating sequences of data for training the RNN-based policy and provides metrics like the size of the initial training data set, but it does not specify explicit training/validation/test splits with percentages, sample counts, or references to predefined splits for any dataset used.
Hardware Specification	No	We evaluated on a 2.3 GHz machine with a 12 GB memory limit and a speciﬁed maximum computation time of 105 seconds. This description is too general and lacks specific CPU models, GPU models, or detailed computer specifications required for hardware reproducibility.
Software Dependencies	No	First, we use the deep learning library Keras (Ketkar, 2017) to train the RNN-based policy from sequences of data. To evaluate policies, we employ the probabilistic model checkers PRISM (Norman et al., 2017) and STORM (Dehnert et al., 2017) for LTL and undiscounted expected reward respectively. No specific version numbers are provided for Keras, PRISM, or STORM.
Experiment Setup	No	To ﬁt the RNN model to the sequences of training data, we use the Adam optimizer (Kingma & Ba, 2015) with a categorical cross-entropy error function (Goodfellow et al., 2016). While the architecture (three-layer LSTM, softmax) and optimization algorithm (Adam) are mentioned, specific hyperparameter values such as learning rate, batch size, or number of epochs are not provided in the main text.