Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Authors: Anders Jonsson, Emilie Kaufmann, Pierre Menard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider random discounted MDPs with infinite horizon in which the maximal number B of successor states and the sparsity of rewards are controlled... For various values of the desired accuracy ε and of the corresponding planning horizon H = logγ(ε(1 γ)/2) (see Section 2), we run simulations on 200 random MDPs.
Researcher Affiliation Collaboration Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Emilie Kaufmann CNRS & ULille (CRISt AL), Inria Scool emilie.kaufmann@univ-lille.fr Pierre Ménard Inria Lille, Scool team pierre.menard@inria.fr Omar Darwiche Domingues Inria Lille, Scool team omar.darwiche-domingues@inria.fr Edouard Leurent Renault & Inria Lille, Scool team edouard.leurent@inria.fr Michal Valko Deep Mind Paris valkom@deepmind.com
Pseudocode Yes A generic implementation of MDP-Gap E is given in Algorithm 1 in Appendix A, where we also discuss some implementation details.
Open Source Code Yes 1The source code of our experiments is available at https://eleurent.github.io/ planning-gap-complexity/
Open Datasets No The paper uses 'random discounted MDPs' and describes their generation process but does not provide access information (link, DOI, citation) for a publicly available or open dataset.
Dataset Splits No The paper describes running simulations on randomly generated MDPs but does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any particular hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes Table 3b: MDP-Gap E parameters Discount factor γ 0.7 Confidence level δ 0.1 Exploration function βr(nt h, δ) log 1/δ + log nt h Exploration function βp(nt h, δ) log 1/δ + log nt h