Optimistic Planning in Markov Decision Processes Using a Generative Model

Authors: Balázs Szörényi, Gunnar Kedenburg, Remi Munos

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main contribution is a planning algorithm, called St OP (for Stochastic Optimistic Planning) that achieves a polynomial sample complexity in terms of (which can be regarded as the leading parameter in this problem), and which is, in terms of this complexity, competitive to other algorithms that can exploit more specifics of their respective domains. It benefits from possible reward or transition probability structures, and does not require any special restriction or knowledge about the MDP besides having access to a generative model. The sample complexity bound is more involved than in previous works, but can be upper-bounded by: (1/ )2+ log κ log(1/γ) +o(1) (1) ... Section 3 presents the consistency and sample complexity results.
Researcher Affiliation Collaboration Bal azs Sz or enyi INRIA Lille Nord Europe, Seque L project, France / MTA-SZTE Research Group on Artificial Intelligence, Hungary balazs.szorenyi@inria.fr Gunnar Kedenburg INRIA Lille Nord Europe, Seque L project, France gunnar.kedenburg@inria.fr Remi Munos INRIA Lille Nord Europe, Seque L project, France remi.munos@inria.fr Current affiliation: Google Deep Mind
Pseudocode Yes Algorithm 1 St OP(s0, δ0, , γ) Algorithm 2 Bound Value(Π, δ) Algorithm 3 Sample(Π, s, m)
Open Source Code No The paper does not provide any statements about releasing open-source code for the described methodology or links to a code repository.
Open Datasets No The paper is theoretical and does not involve experiments with datasets or training.
Dataset Splits No The paper is theoretical and does not involve experiments with datasets or validation splits.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies No The paper does not specify any software dependencies with version numbers required for reproducibility.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details such as hyperparameters or system-level training settings.