Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Authors: Zuyue Fu, Zhuoran Yang, Zhaoran Wang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear O(K 1/2) rate, where K is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time. Moreover, under the broader scope of policy optimization with nonlinear function approximation, we prove that actorcritic with deep neural network finds the globally optimal policy at a sublinear rate for the first time. |
| Researcher Affiliation | Academia | Zuyue Fu Northwestern University zuyue.fu@u.northwestern.edu Zhuoran Yang Princeton University zy6@princeton.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com |
| Pseudocode | Yes | Algorithm 1 Linear Actor-Critic Method Algorithm 2 Deep Neural Actor-Critic Method Algorithm 3 Actor Update for Deep Neural Actor-Critic Method Algorithm 4 Critic Update for Deep Neural Actor-Critic Method |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is a theoretical work and does not describe experiments performed on any specific dataset, thus there is no mention of publicly available datasets for training. |
| Dataset Splits | No | As a theoretical paper, no experimental data splits for training, validation, or testing are provided. |
| Hardware Specification | No | The paper is theoretical and does not report experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not report specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations. |