Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Authors: Max Simchowitz, Kevin G. Jamieson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper establishes that optimistic algorithms attain gap-dependent and nonasymptotic logarithmic regret for episodic MDPs. [...] The key technique in our analysis is a novel clipped regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.
Researcher Affiliation Academia Max Simchowitz UC Berkeley msimchow@berkeley.edu Kevin Jamieson University of Washington jamieson@cs.washington.edu
Pseudocode Yes Strong Euler is based on carefully selected bonuses from [14], and formally instantiated in Algorithm 1 in Appendix E.
Open Source Code No The paper does not mention the availability of open-source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets No The paper is theoretical and focuses on regret bounds for MDPs. It does not conduct empirical studies with datasets, thus no dataset is mentioned as publicly available for training.
Dataset Splits No The paper is theoretical and does not describe experiments that would require dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe empirical experiments, therefore, no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe an implementation or empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical analysis of algorithms. It does not describe an experimental setup with hyperparameters or system-level training settings.