reproducibilityindex.ai

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

Authors: Yue Frank Wu, Weitong ZHANG, Pan Xu, Quanquan Gu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., J(θ) 2 2 ϵ) of the non-concave performance function J(θ), with e O(ϵ 2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
Researcher Affiliation	Academia	Yue Wu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 ywu@cs.ucla.edu Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 weightzero@cs.ucla.edu Pan Xu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 panxu@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Two Time-Scale Actor-Critic
Open Source Code	No	The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository for the described methodology.
Open Datasets	No	The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not mention specific datasets or their public availability for training.
Dataset Splits	No	The paper is purely theoretical and does not involve experimental evaluation on datasets. Therefore, it does not provide details about training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup that would require hardware specifications. No hardware details are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is purely theoretical and focuses on mathematical analysis and proofs. It does not describe any experimental setup details such as hyperparameters or system-level training settings.