A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

Authors: Yue Frank Wu, Weitong ZHANG, Pan Xu, Quanquan Gu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., J(θ) 2 2 ϵ) of the non-concave performance function J(θ), with e O(ϵ 2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
Researcher Affiliation Academia Yue Wu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 ywu@cs.ucla.edu Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 weightzero@cs.ucla.edu Pan Xu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 panxu@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Two Time-Scale Actor-Critic
Open Source Code No The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository for the described methodology.
Open Datasets No The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not mention specific datasets or their public availability for training.
Dataset Splits No The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not provide details about training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experimental setup that would require hardware specifications. No hardware details are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental setup. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup No The paper is purely theoretical and focuses on mathematical analysis and proofs. It does not describe any experimental setup details such as hyperparameters or system-level training settings.