Convergence of Actor-Critic with Multi-Layer Neural Networks

Authors: Haoxing Tian, Alex Olshevsky, Yannis Paschalidis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The early theory of actor-critic methods considered convergence... In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice." and "The main contribution of this paper is to provide the first analysis of AC with neural networks of arbitrary depth. While replicating the earlier results of a O T 0.5 convergence rate and O m 0.5 error, our work considers a single-loop method with proportional step-sizes (sometimes called single-timescale ). We prove this result under Markov sampling and project onto a ball of constant radius around the initial condition.
Researcher Affiliation Academia Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky Department of Electrical and Computer Engineering Boston University Boston, MA 02215, USA {tianhx, yannisp, alexols}@bu.edu
Pseudocode Yes Algorithm 1 details the algorithm considered in this paper. Algorithm 1 Actor-Critic Require: Numbers of iterations T, learning rate αw and αθ, projection set W. Initialize θ0, br and w(k) such that |br| 1, r and every entry of w(k) is chosen from N(0, 1). Initialize the starting state-action pair s0, a0. for t {1, 2, . . . , T} do Sample st Penv(s|st 1, at 1), at π(a|st, θt), s t Penv(s|st, at), a t π(a|s t, θt). Sample ˆOt by first sampling a random variable T with P(T = t) = (1 γ)γt, and second obtaining T transitions by starting at s0 and taking actions according to π(a|s, θt). Compute δt, f(Ot, wt), g( ˆOt, wt, θt), and update wt+1 and θt+1 as wt+1 = Proj W {wt + αwf(Ot, wt)} , θt+1 = θt αθ 1 γ g( ˆOt, wt, θt).
Open Source Code No The paper does not mention providing open-source code for the described methodology.
Open Datasets No The paper is purely theoretical and does not involve experimental training on a dataset.
Dataset Splits No The paper is purely theoretical and does not describe experimental validation or dataset splits.
Hardware Specification No The paper is purely theoretical and does not mention any specific hardware used for experiments.
Software Dependencies No The paper is purely theoretical and does not specify any software dependencies with version numbers required for an experimental setup.
Experiment Setup No The paper is purely theoretical and does not describe any specific experimental setup details or hyperparameters.