Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Asynchronous Q-learning aims to learn the optimal action-value function (or Qfunction) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a γ-discounted MDP with state space S and action space A, we demonstrate that the ℓ -based sample complexity of classical asynchronous Q-learning namely, the number of samples needed to yield an entrywise ε-accurate estimate of the Q-function is at most on the order of 1 µmin(1 γ)5ε2 + tmix µmin(1 γ) up to some logarithmic factor, provided that a proper constant learning rate is adopted. |
| Researcher Affiliation | Academia | Gen Li Tsinghua Yuting Wei CMU Yuejie Chi CMU Yuantao Gu Tsinghua Yuxin Chen Princeton |
| Pseudocode | Yes | Algorithm 1: Asynchronous Q-learning; Algorithm 2: Asynchronous variance-reduced Q-learning; Algorithm 3: function Q = VR-Q-RUN-EPOCH |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. There are no links to repositories or explicit statements about code release. |
| Open Datasets | No | The paper is theoretical and does not use or reference any datasets for training or empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental validation or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not detail any software dependencies or specific version numbers required for reproducing empirical results. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis, not on specific experimental setups with hyperparameters or training configurations for empirical runs. |