reproducibilityindex.ai

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

Authors: Zuyue Fu, Zhuoran Yang, Zhaoran Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear O(K 1/2) rate, where K is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the ﬁrst time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actorcritic with deep neural network ﬁnds the globally optimal policy at a sublinear rate for the ﬁrst time.
Researcher Affiliation	Academia	Zuyue Fu Northwestern University zuyue.fu@u.northwestern.edu Zhuoran Yang Princeton University zy6@princeton.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com
Pseudocode	Yes	Algorithm 1 Linear Actor-Critic Method Algorithm 2 Deep Neural Actor-Critic Method Algorithm 3 Actor Update for Deep Neural Actor-Critic Method Algorithm 4 Critic Update for Deep Neural Actor-Critic Method
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is a theoretical work and does not describe experiments performed on any specific dataset, thus there is no mention of publicly available datasets for training.
Dataset Splits	No	As a theoretical paper, no experimental data splits for training, validation, or testing are provided.
Hardware Specification	No	The paper is theoretical and does not report experimental hardware specifications.
Software Dependencies	No	The paper is theoretical and does not report specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup	No	The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations.