Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Authors: Max Simchowitz, Kevin G. Jamieson
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper establishes that optimistic algorithms attain gap-dependent and nonasymptotic logarithmic regret for episodic MDPs. [...] The key technique in our analysis is a novel clipped regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs. |
| Researcher Affiliation | Academia | Max Simchowitz UC Berkeley msimchow@berkeley.edu Kevin Jamieson University of Washington jamieson@cs.washington.edu |
| Pseudocode | Yes | Strong Euler is based on carefully selected bonuses from [14], and formally instantiated in Algorithm 1 in Appendix E. |
| Open Source Code | No | The paper does not mention the availability of open-source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on regret bounds for MDPs. It does not conduct empirical studies with datasets, thus no dataset is mentioned as publicly available for training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments that would require dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe an implementation or empirical experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical analysis of algorithms. It does not describe an experimental setup with hyperparameters or system-level training settings. |