LTL-Constrained Steady-State Policy Synthesis

Authors: Jan Křetínský

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study Markov decision processes (MDP) with the specifications combining all these three types, yielding a more balanced and holistic perspective. We synthesize a policy maximizing the LRA reward among all policies ensuring the LTL specification (with the given probability) and adhering to the steady-state constraints. To this end, we provide a unified solution conceptually reducing the problem with the heterogeneous specification combining three types of properties to a problem with a single-type specification, namely the multi-dimensional LRA reward. This in turn can be solved using a single simple linear programme, which is easy to understand and shorter than previous solutions to the special cases considered in the literature. Not only does the unified approach allow us to use the powerful results from literature on rewards, but it also allows for easy extensions and modifications such as considering various classes of policies (depending on the memory available or further requirements on the induced chain) or variants of the objectives (e.g. constraints on frequency of satisfying recurrent goals, multiple rewards, trade-offs between objectives), which we also describe. Our reduction is particularly elegant due to the use of Limit Deterministic B uchi Automata (LDBA), recently studied in the context of LTL model checking on MDP. Moreover, the solution extends trivially from LTL to the general ω-regular properties (given as automata) and runs in time polynomial in the sizes of the MDP as well as the LDBA.
Researcher Affiliation Academia Jan Kˇret ınsk y Technical University of Munich jan.kretinsky@tum.de
Pseudocode Yes Figure 3: Linear program for policy synthesis in general (multichain) MDP and Figure 4: Linear program for the long-run specification on the product MDP (together with the policy-flow constraints of the LP in Fig. 3)
Open Source Code No The paper mentions external tools (PRISM, Rabinizer 4, Mochiba) but does not provide any statement or link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets No The paper is theoretical and does not involve the use of datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not involve empirical validation with dataset splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings.