Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Authors: Anders Jonsson, Emilie Kaufmann, Pierre Menard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider random discounted MDPs with infinite horizon in which the maximal number B of successor states and the sparsity of rewards are controlled... For various values of the desired accuracy ε and of the corresponding planning horizon H = logγ(ε(1 γ)/2) (see Section 2), we run simulations on 200 random MDPs. |
| Researcher Affiliation | Collaboration | Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Emilie Kaufmann CNRS & ULille (CRISt AL), Inria Scool emilie.kaufmann@univ-lille.fr Pierre Ménard Inria Lille, Scool team pierre.menard@inria.fr Omar Darwiche Domingues Inria Lille, Scool team omar.darwiche-domingues@inria.fr Edouard Leurent Renault & Inria Lille, Scool team edouard.leurent@inria.fr Michal Valko Deep Mind Paris valkom@deepmind.com |
| Pseudocode | Yes | A generic implementation of MDP-Gap E is given in Algorithm 1 in Appendix A, where we also discuss some implementation details. |
| Open Source Code | Yes | 1The source code of our experiments is available at https://eleurent.github.io/ planning-gap-complexity/ |
| Open Datasets | No | The paper uses 'random discounted MDPs' and describes their generation process but does not provide access information (link, DOI, citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes running simulations on randomly generated MDPs but does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any particular hardware details such as GPU/CPU models or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | Table 3b: MDP-Gap E parameters Discount factor γ 0.7 Confidence level δ 0.1 Exploration function βr(nt h, δ) log 1/δ + log nt h Exploration function βp(nt h, δ) log 1/δ + log nt h |