reproducibilityindex.ai

LTL-Constrained Steady-State Policy Synthesis

Authors: Jan Křetínský

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we study Markov decision processes (MDP) with the speciﬁcations combining all these three types, yielding a more balanced and holistic perspective. We synthesize a policy maximizing the LRA reward among all policies ensuring the LTL speciﬁcation (with the given probability) and adhering to the steady-state constraints. To this end, we provide a uniﬁed solution conceptually reducing the problem with the heterogeneous speciﬁcation combining three types of properties to a problem with a single-type speciﬁcation, namely the multi-dimensional LRA reward. This in turn can be solved using a single simple linear programme, which is easy to understand and shorter than previous solutions to the special cases considered in the literature. Not only does the uniﬁed approach allow us to use the powerful results from literature on rewards, but it also allows for easy extensions and modiﬁcations such as considering various classes of policies (depending on the memory available or further requirements on the induced chain) or variants of the objectives (e.g. constraints on frequency of satisfying recurrent goals, multiple rewards, trade-offs between objectives), which we also describe. Our reduction is particularly elegant due to the use of Limit Deterministic B uchi Automata (LDBA), recently studied in the context of LTL model checking on MDP. Moreover, the solution extends trivially from LTL to the general ω-regular properties (given as automata) and runs in time polynomial in the sizes of the MDP as well as the LDBA.
Researcher Affiliation	Academia	Jan Kˇret ınsk y Technical University of Munich jan.kretinsky@tum.de
Pseudocode	Yes	Figure 3: Linear program for policy synthesis in general (multichain) MDP and Figure 4: Linear program for the long-run speciﬁcation on the product MDP (together with the policy-ﬂow constraints of the LP in Fig. 3)
Open Source Code	No	The paper mentions external tools (PRISM, Rabinizer 4, Mochiba) but does not provide any statement or link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets	No	The paper is theoretical and does not involve the use of datasets for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not involve empirical validation with dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings.