reproducibilityindex.ai

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Asynchronous Q-learning aims to learn the optimal action-value function (or Qfunction) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a γ-discounted MDP with state space S and action space A, we demonstrate that the ℓ -based sample complexity of classical asynchronous Q-learning namely, the number of samples needed to yield an entrywise ε-accurate estimate of the Q-function is at most on the order of 1 µmin(1 γ)5ε2 + tmix µmin(1 γ) up to some logarithmic factor, provided that a proper constant learning rate is adopted.
Researcher Affiliation	Academia	Gen Li Tsinghua Yuting Wei CMU Yuejie Chi CMU Yuantao Gu Tsinghua Yuxin Chen Princeton
Pseudocode	Yes	Algorithm 1: Asynchronous Q-learning; Algorithm 2: Asynchronous variance-reduced Q-learning; Algorithm 3: function Q = VR-Q-RUN-EPOCH
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There are no links to repositories or explicit statements about code release.
Open Datasets	No	The paper is theoretical and does not use or reference any datasets for training or empirical evaluation.
Dataset Splits	No	The paper is theoretical and does not involve experimental validation or dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not detail any software dependencies or specific version numbers required for reproducing empirical results.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and analysis, not on specific experimental setups with hyperparameters or training configurations for empirical runs.