Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning
Authors: Andrea Zanette, Martin Wainwright
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. [...] The main contribution of this paper is to design and analyze a variant of the Q-learning algorithm that is guaranteed to minimize regret over the class of low-rank MDPs. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA 2Department of Electrical Engineering and Computer Sciences and Department of Statistics, University of California, Berkeley, USA. Correspondence to: Andrea Zanette <zanette@berkeley.edu>, Martin J. Wainwright <wainwrig@berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 S3Q-LEARNING [...] Algorithm 2 S4Q-LEARNING |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and analysis, providing regret bounds and proofs. It does not involve training models on specific datasets, hence no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments. Therefore, no dataset splits for training, validation, or testing are provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design, not on empirical experimentation. Therefore, it does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical, describing algorithms and proofs. It does not mention any specific software dependencies or version numbers required to implement or replicate the work. |
| Experiment Setup | No | The paper is primarily theoretical, focusing on the design and analysis of Q-learning variants with formal guarantees. It does not include empirical experiments or details about an experimental setup, such as hyperparameters or system-level training settings. |