reproducibilityindex.ai

Optimistic Planning in Markov Decision Processes Using a Generative Model

Authors: Balázs Szörényi, Gunnar Kedenburg, Remi Munos

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our main contribution is a planning algorithm, called St OP (for Stochastic Optimistic Planning) that achieves a polynomial sample complexity in terms of (which can be regarded as the leading parameter in this problem), and which is, in terms of this complexity, competitive to other algorithms that can exploit more speciﬁcs of their respective domains. It beneﬁts from possible reward or transition probability structures, and does not require any special restriction or knowledge about the MDP besides having access to a generative model. The sample complexity bound is more involved than in previous works, but can be upper-bounded by: (1/ )2+ log κ log(1/γ) +o(1) (1) ... Section 3 presents the consistency and sample complexity results.
Researcher Affiliation	Collaboration	Bal azs Sz or enyi INRIA Lille Nord Europe, Seque L project, France / MTA-SZTE Research Group on Artiﬁcial Intelligence, Hungary balazs.szorenyi@inria.fr Gunnar Kedenburg INRIA Lille Nord Europe, Seque L project, France gunnar.kedenburg@inria.fr Remi Munos INRIA Lille Nord Europe, Seque L project, France remi.munos@inria.fr Current afﬁliation: Google Deep Mind
Pseudocode	Yes	Algorithm 1 St OP(s0, δ0, , γ) Algorithm 2 Bound Value(Π, δ) Algorithm 3 Sample(Π, s, m)
Open Source Code	No	The paper does not provide any statements about releasing open-source code for the described methodology or links to a code repository.
Open Datasets	No	The paper is theoretical and does not involve experiments with datasets or training.
Dataset Splits	No	The paper is theoretical and does not involve experiments with datasets or validation splits.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers required for reproducibility.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details such as hyperparameters or system-level training settings.