reproducibilityindex.ai

Beyond Strong-Cyclic: Doing Your Best in Stochastic Environments

Authors: Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin, Florian Zuleger

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We introduce best-effort policies for (not necessarily Markovian) stochastic domains. These generalize strong-cyclic policies by taking advantage of stochasticity even if the goal cannot be reached with probability 1. We compare such policies with optimal policies, i.e., policies that maximize the probability that the goal is achieved, and show that optimal policies are best-effort, but that the converse is false in general. All this shows that best-effort policies are robust to changes in the probabilities, as long as the support is unchanged. Theorem 1. There exists a SBE policy for (D, G). The proof of this result is an adaptation of the fact that besteffort policies exist in the context of synthesis under assumptions [Aminof et al., 2020a].
Researcher Affiliation	Academia	Benjamin Aminof1 , Giuseppe De Giacomo2 , Sasha Rubin3 and Florian Zuleger1 1TU Vienna, Austria 2University of Rome La Sapienza , Italy 3University of Sydney, Australia {aminof, zuleger}@forsyte.at, degiacomo@diag.uniroma1.it, sasha.rubin@sydney.edu.au
Pseudocode	No	The algorithm for Part 1 is simple. Given the support : Obs Act 2Obs \ { } and LTL/LTLf formula ϕ: Step 1. Fix a similar ﬁnite MDP D = (F, A, s0, Pr) by letting Pr(s, a) be, e.g., the uniform distribution over (s, a): Pr(s, a)(s ) := 1/\| (s, a)\| for s (s, a). Step 2. Return a ﬁnite-state optimal σ for the stochastic planning problem (D, ϕ). This can be computed using any standard technique, e.g., [Bianco and de Alfaro, 1995]. This is a textual description of steps, not a structured pseudocode block.
Open Source Code	No	The paper is theoretical and does not mention releasing any source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not use or reference any datasets for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not involve dataset splits (training, validation, test) for empirical evaluation.
Hardware Specification	No	The paper is theoretical and does not specify any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not specify software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.