Beyond Strong-Cyclic: Doing Your Best in Stochastic Environments

Authors: Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin, Florian Zuleger

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We introduce best-effort policies for (not necessarily Markovian) stochastic domains. These generalize strong-cyclic policies by taking advantage of stochasticity even if the goal cannot be reached with probability 1. We compare such policies with optimal policies, i.e., policies that maximize the probability that the goal is achieved, and show that optimal policies are best-effort, but that the converse is false in general. All this shows that best-effort policies are robust to changes in the probabilities, as long as the support is unchanged. Theorem 1. There exists a SBE policy for (D, G). The proof of this result is an adaptation of the fact that besteffort policies exist in the context of synthesis under assumptions [Aminof et al., 2020a].
Researcher Affiliation Academia Benjamin Aminof1 , Giuseppe De Giacomo2 , Sasha Rubin3 and Florian Zuleger1 1TU Vienna, Austria 2University of Rome La Sapienza , Italy 3University of Sydney, Australia {aminof, zuleger}@forsyte.at, degiacomo@diag.uniroma1.it, sasha.rubin@sydney.edu.au
Pseudocode No The algorithm for Part 1 is simple. Given the support : Obs Act 2Obs \ { } and LTL/LTLf formula ϕ: Step 1. Fix a similar finite MDP D = (F, A, s0, Pr) by letting Pr(s, a) be, e.g., the uniform distribution over (s, a): Pr(s, a)(s ) := 1/| (s, a)| for s (s, a). Step 2. Return a finite-state optimal σ for the stochastic planning problem (D, ϕ). This can be computed using any standard technique, e.g., [Bianco and de Alfaro, 1995]. This is a textual description of steps, not a structured pseudocode block.
Open Source Code No The paper is theoretical and does not mention releasing any source code for the described methodology.
Open Datasets No The paper is theoretical and does not use or reference any datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not involve dataset splits (training, validation, test) for empirical evaluation.
Hardware Specification No The paper is theoretical and does not specify any hardware used for experiments.
Software Dependencies No The paper is theoretical and does not specify software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.