Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
LTL-Constrained Steady-State Policy Synthesis
Authors: Jan Křetínský
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study Markov decision processes (MDP) with the specifications combining all these three types, yielding a more balanced and holistic perspective. We synthesize a policy maximizing the LRA reward among all policies ensuring the LTL specification (with the given probability) and adhering to the steady-state constraints. To this end, we provide a unified solution conceptually reducing the problem with the heterogeneous specification combining three types of properties to a problem with a single-type specification, namely the multi-dimensional LRA reward. This in turn can be solved using a single simple linear programme, which is easy to understand and shorter than previous solutions to the special cases considered in the literature. Not only does the unified approach allow us to use the powerful results from literature on rewards, but it also allows for easy extensions and modifications such as considering various classes of policies (depending on the memory available or further requirements on the induced chain) or variants of the objectives (e.g. constraints on frequency of satisfying recurrent goals, multiple rewards, trade-offs between objectives), which we also describe. Our reduction is particularly elegant due to the use of Limit Deterministic B uchi Automata (LDBA), recently studied in the context of LTL model checking on MDP. Moreover, the solution extends trivially from LTL to the general ω-regular properties (given as automata) and runs in time polynomial in the sizes of the MDP as well as the LDBA. |
| Researcher Affiliation | Academia | Jan Kˇret ınsk y Technical University of Munich EMAIL |
| Pseudocode | Yes | Figure 3: Linear program for policy synthesis in general (multichain) MDP and Figure 4: Linear program for the long-run specification on the product MDP (together with the policy-flow constraints of the LP in Fig. 3) |
| Open Source Code | No | The paper mentions external tools (PRISM, Rabinizer 4, Mochiba) but does not provide any statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | No | The paper is theoretical and does not involve the use of datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |