reproducibilityindex.ai

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Authors: Anders Jonsson, Emilie Kaufmann, Pierre Menard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider random discounted MDPs with inﬁnite horizon in which the maximal number B of successor states and the sparsity of rewards are controlled... For various values of the desired accuracy ε and of the corresponding planning horizon H = logγ(ε(1 γ)/2) (see Section 2), we run simulations on 200 random MDPs.
Researcher Affiliation	Collaboration	Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Emilie Kaufmann CNRS & ULille (CRISt AL), Inria Scool emilie.kaufmann@univ-lille.fr Pierre Ménard Inria Lille, Scool team pierre.menard@inria.fr Omar Darwiche Domingues Inria Lille, Scool team omar.darwiche-domingues@inria.fr Edouard Leurent Renault & Inria Lille, Scool team edouard.leurent@inria.fr Michal Valko Deep Mind Paris valkom@deepmind.com
Pseudocode	Yes	A generic implementation of MDP-Gap E is given in Algorithm 1 in Appendix A, where we also discuss some implementation details.
Open Source Code	Yes	1The source code of our experiments is available at https://eleurent.github.io/ planning-gap-complexity/
Open Datasets	No	The paper uses 'random discounted MDPs' and describes their generation process but does not provide access information (link, DOI, citation) for a publicly available or open dataset.
Dataset Splits	No	The paper describes running simulations on randomly generated MDPs but does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper does not specify any particular hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup	Yes	Table 3b: MDP-Gap E parameters Discount factor γ 0.7 Conﬁdence level δ 0.1 Exploration function βr(nt h, δ) log 1/δ + log nt h Exploration function βp(nt h, δ) log 1/δ + log nt h