Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Authors: Gen Li, Changxiao Cai, Yuxin Chen, Yuantao Gu, Yuting Wei, Yuejie Chi

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we sharpen the sample complexity of synchronous Q-learning to the order of |S||A| (1 γ)4ε2 (up to some logarithmic factor) for any 0 < ε < 1, leading to an order-wise improvement in 1 1 γ . Analogous results are derived for finite-horizon MDPs as well. Our sample complexity analysis unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage. Our result is obtained by identifying novel error decompositions and recursions, which might shed light on how to study other variants of Q-learning.
Researcher Affiliation Academia 1Department of Electronic Engineering, Tsinghua University 2Department of Electrical and Computer Engineering, Princeton University 3Department of Statistics and Data Science, Carnegie Mellon University 4Department of Electrical and Computer Engineering, Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Synchronous Q-learning for infinite-horizon discounted MDPs; Algorithm 2 Synchronous Q-learning for finite-horizon MDPs
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No This paper is theoretical and focuses on sample complexity analysis for Q-learning in MDPs. It does not use or refer to any specific public or open datasets for training or evaluation.
Dataset Splits No This paper is theoretical and focuses on sample complexity analysis for Q-learning in MDPs. It does not involve experimental validation with dataset splits.
Hardware Specification No This paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies No This paper is theoretical and does not describe any experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup No This paper is theoretical and focuses on mathematical analysis of Q-learning. It does not describe an experimental setup with specific hyperparameters or training settings.