Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Convergence Rates of Policy Gradient Methods

Authors: Lin Xiao

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	First, we develop a theory of weak gradient-mapping dominance and use it to prove sharp sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods... enjoy a linear rate of convergence... Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.
Researcher Affiliation	Industry	Lin Xiao EMAIL Meta AI Research Seattle, WA 98109, USA
Pseudocode	No	The paper describes methods like the projected policy gradient method using equation (22) and policy mirror descent methods using equation (39), but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets	No	The paper is theoretical and analyzes convergence rates and sample complexity under a 'simple generative model' (Section 5.1). It does not perform experiments on publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets, thus no dataset splits are discussed.
Hardware Specification	No	The paper is theoretical and does not present any experimental results, so there is no mention of hardware specifications used.
Software Dependencies	No	The paper is theoretical and does not detail any experimental implementation, thus it does not list software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on convergence analysis, not experimental implementation. Therefore, it does not provide details on experimental setup or hyperparameter values.