Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points
Authors: Long Yang, Qian Zheng, Gang Pan10630-10638
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our result shows that policy gradient converges to an (ϵ, ϵχ)SOSP with probability at least 1 e O(δ) after the total cost 2 (1 γ) χ log 1 2 ), where γ (0, 1). It sig- nificantly improves the state of the art cost e O(ϵ 9).Our analysis is based on the key idea that decomposes the parameter space Rp into three non-intersected regions: non-stationary point region, saddle point region, and local optimal region, then making a local improvement of the objective of RL in each region. This technique can be potentially generalized to extensive policy gradient methods. For the complete proof, please refer to https://arxiv.org/pdf/2012.01491.pdf. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University, China. 2School of Electrical and Electronic Engineering, Nanyang Technological University,Singapore. 1 EMAIL; EMAIL |
| Pseudocode | No | The paper describes update rules like (5) (θk+1 = θk + α \ J(θk)), but these are equations within the text and not presented as formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide a statement or a link indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not describe the use of any datasets for training experiments. Example 1 describes a conceptual MDP, not a dataset used for empirical training. |
| Dataset Splits | No | The paper is theoretical and does not mention any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on complexity analysis rather than detailing an experimental setup, hyperparameters, or training configurations. |