Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning

Authors: Andrea Zanette, Martin Wainwright

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. [...] The main contribution of this paper is to design and analyze a variant of the Q-learning algorithm that is guaranteed to minimize regret over the class of low-rank MDPs.
Researcher Affiliation Academia 1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA 2Department of Electrical Engineering and Computer Sciences and Department of Statistics, University of California, Berkeley, USA. Correspondence to: Andrea Zanette <zanette@berkeley.edu>, Martin J. Wainwright <wainwrig@berkeley.edu>.
Pseudocode Yes Algorithm 1 S3Q-LEARNING [...] Algorithm 2 S4Q-LEARNING
Open Source Code No The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets No The paper is theoretical and focuses on algorithm design and analysis, providing regret bounds and proofs. It does not involve training models on specific datasets, hence no information about publicly available datasets is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments. Therefore, no dataset splits for training, validation, or testing are provided.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design, not on empirical experimentation. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies No The paper is theoretical, describing algorithms and proofs. It does not mention any specific software dependencies or version numbers required to implement or replicate the work.
Experiment Setup No The paper is primarily theoretical, focusing on the design and analysis of Q-learning variants with formal guarantees. It does not include empirical experiments or details about an experimental setup, such as hyperparameters or system-level training settings.