Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimistic Planning in Markov Decision Processes Using a Generative Model
Authors: Balázs Szörényi, Gunnar Kedenburg, Remi Munos
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is a planning algorithm, called St OP (for Stochastic Optimistic Planning) that achieves a polynomial sample complexity in terms of (which can be regarded as the leading parameter in this problem), and which is, in terms of this complexity, competitive to other algorithms that can exploit more specifics of their respective domains. It benefits from possible reward or transition probability structures, and does not require any special restriction or knowledge about the MDP besides having access to a generative model. The sample complexity bound is more involved than in previous works, but can be upper-bounded by: (1/ )2+ log κ log(1/γ) +o(1) (1) ... Section 3 presents the consistency and sample complexity results. |
| Researcher Affiliation | Collaboration | Bal azs Sz or enyi INRIA Lille Nord Europe, Seque L project, France / MTA-SZTE Research Group on Artificial Intelligence, Hungary EMAIL Gunnar Kedenburg INRIA Lille Nord Europe, Seque L project, France EMAIL Remi Munos INRIA Lille Nord Europe, Seque L project, France EMAIL Current affiliation: Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 St OP(s0, δ0, , γ) Algorithm 2 Bound Value(Π, δ) Algorithm 3 Sample(Π, s, m) |
| Open Source Code | No | The paper does not provide any statements about releasing open-source code for the described methodology or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not involve experiments with datasets or training. |
| Dataset Splits | No | The paper is theoretical and does not involve experiments with datasets or validation splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require hardware specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details such as hyperparameters or system-level training settings. |