Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Convergence Rates of Policy Gradient Methods
Authors: Lin Xiao
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | First, we develop a theory of weak gradient-mapping dominance and use it to prove sharp sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods... enjoy a linear rate of convergence... Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model. |
| Researcher Affiliation | Industry | Lin Xiao EMAIL Meta AI Research Seattle, WA 98109, USA |
| Pseudocode | No | The paper describes methods like the projected policy gradient method using equation (22) and policy mirror descent methods using equation (39), but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | No | The paper is theoretical and analyzes convergence rates and sample complexity under a 'simple generative model' (Section 5.1). It does not perform experiments on publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, thus no dataset splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not present any experimental results, so there is no mention of hardware specifications used. |
| Software Dependencies | No | The paper is theoretical and does not detail any experimental implementation, thus it does not list software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on convergence analysis, not experimental implementation. Therefore, it does not provide details on experimental setup or hyperparameter values. |