reproducibilityindex.ai

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Authors: Long Yang, Qian Zheng, Gang Pan10630-10638

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our result shows that policy gradient converges to an (ϵ, ϵχ)SOSP with probability at least 1 e O(δ) after the total cost 2 (1 γ) χ log 1 2 ), where γ (0, 1). It sig- niﬁcantly improves the state of the art cost e O(ϵ 9).Our analysis is based on the key idea that decomposes the parameter space Rp into three non-intersected regions: non-stationary point region, saddle point region, and local optimal region, then making a local improvement of the objective of RL in each region. This technique can be potentially generalized to extensive policy gradient methods. For the complete proof, please refer to https://arxiv.org/pdf/2012.01491.pdf.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Zhejiang University, China. 2School of Electrical and Electronic Engineering, Nanyang Technological University,Singapore. 1 {yanglong,gpan}@zju.edu.cn; 2zhengqian@ntu.edu.sg
Pseudocode	No	The paper describes update rules like (5) (θk+1 = θk + α \ J(θk)), but these are equations within the text and not presented as formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide a statement or a link indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not describe the use of any datasets for training experiments. Example 1 describes a conceptual MDP, not a dataset used for empirical training.
Dataset Splits	No	The paper is theoretical and does not mention any training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on complexity analysis rather than detailing an experimental setup, hyperparameters, or training configurations.