Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

Authors: Zuyue Fu, Zhuoran Yang, Zhaoran Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear O(K 1/2) rate, where K is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actorcritic with deep neural network finds the globally optimal policy at a sublinear rate for the first time.
Researcher Affiliation Academia Zuyue Fu Northwestern University zuyue.fu@u.northwestern.edu Zhuoran Yang Princeton University zy6@princeton.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com
Pseudocode Yes Algorithm 1 Linear Actor-Critic Method Algorithm 2 Deep Neural Actor-Critic Method Algorithm 3 Actor Update for Deep Neural Actor-Critic Method Algorithm 4 Critic Update for Deep Neural Actor-Critic Method
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper is a theoretical work and does not describe experiments performed on any specific dataset, thus there is no mention of publicly available datasets for training.
Dataset Splits No As a theoretical paper, no experimental data splits for training, validation, or testing are provided.
Hardware Specification No The paper is theoretical and does not report experimental hardware specifications.
Software Dependencies No The paper is theoretical and does not report specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations.