Is Q-Learning Provably Efficient?
Authors: Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I. Jordan
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret O(H3SAT), where S and A are the numbers of states and actions, H is the number of steps per episode, and T is the total number of steps. This sample efficiency matches the optimal regret that can be achieved by any model-based approach, up to a single H factor. To the best of our knowledge, this is the first analysis in the model-free setting that establishes T regret without requiring access to a simulator. |
| Researcher Affiliation | Collaboration | Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Zeyuan Allen-Zhu Microsoft Research, Redmond zeyuan@csail.mit.edu Sebastien Bubeck Microsoft Research, Redmond sebubeck@microsoft.com Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Q-learning with UCB-Hoeffding |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | This paper is theoretical and focuses on mathematical proofs and analysis of Q-learning, not empirical evaluation using specific datasets. Therefore, no dataset information is provided. |
| Dataset Splits | No | This paper is theoretical and does not involve empirical experiments with dataset splits. |
| Hardware Specification | No | This paper is theoretical and does not report on experiments requiring hardware specifications. |
| Software Dependencies | No | This paper is theoretical and does not report on experiments requiring specific software dependencies and their versions. |
| Experiment Setup | No | This paper is theoretical and does not involve empirical experiments with specific setup details or hyperparameters. |