reproducibilityindex.ai

Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning

Authors: Andrea Zanette, Martin Wainwright

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. [...] The main contribution of this paper is to design and analyze a variant of the Q-learning algorithm that is guaranteed to minimize regret over the class of low-rank MDPs.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA 2Department of Electrical Engineering and Computer Sciences and Department of Statistics, University of California, Berkeley, USA. Correspondence to: Andrea Zanette <zanette@berkeley.edu>, Martin J. Wainwright <wainwrig@berkeley.edu>.
Pseudocode	Yes	Algorithm 1 S3Q-LEARNING [...] Algorithm 2 S4Q-LEARNING
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and analysis, providing regret bounds and proofs. It does not involve training models on specific datasets, hence no information about publicly available datasets is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments. Therefore, no dataset splits for training, validation, or testing are provided.
Hardware Specification	No	The paper focuses on theoretical analysis and algorithm design, not on empirical experimentation. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies	No	The paper is theoretical, describing algorithms and proofs. It does not mention any specific software dependencies or version numbers required to implement or replicate the work.
Experiment Setup	No	The paper is primarily theoretical, focusing on the design and analysis of Q-learning variants with formal guarantees. It does not include empirical experiments or details about an experimental setup, such as hyperparameters or system-level training settings.