A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
Authors: Yue Frank Wu, Weitong ZHANG, Pan Xu, Quanquan Gu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., J(θ) 2 2 ϵ) of the non-concave performance function J(θ), with e O(ϵ 2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods. |
| Researcher Affiliation | Academia | Yue Wu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 ywu@cs.ucla.edu Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 weightzero@cs.ucla.edu Pan Xu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 panxu@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Two Time-Scale Actor-Critic |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository for the described methodology. |
| Open Datasets | No | The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not mention specific datasets or their public availability for training. |
| Dataset Splits | No | The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not provide details about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup that would require hardware specifications. No hardware details are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is purely theoretical and focuses on mathematical analysis and proofs. It does not describe any experimental setup details such as hyperparameters or system-level training settings. |