Convergence of Actor-Critic with Multi-Layer Neural Networks
Authors: Haoxing Tian, Alex Olshevsky, Yannis Paschalidis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The early theory of actor-critic methods considered convergence... In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice." and "The main contribution of this paper is to provide the first analysis of AC with neural networks of arbitrary depth. While replicating the earlier results of a O T 0.5 convergence rate and O m 0.5 error, our work considers a single-loop method with proportional step-sizes (sometimes called single-timescale ). We prove this result under Markov sampling and project onto a ball of constant radius around the initial condition. |
| Researcher Affiliation | Academia | Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky Department of Electrical and Computer Engineering Boston University Boston, MA 02215, USA {tianhx, yannisp, alexols}@bu.edu |
| Pseudocode | Yes | Algorithm 1 details the algorithm considered in this paper. Algorithm 1 Actor-Critic Require: Numbers of iterations T, learning rate αw and αθ, projection set W. Initialize θ0, br and w(k) such that |br| 1, r and every entry of w(k) is chosen from N(0, 1). Initialize the starting state-action pair s0, a0. for t {1, 2, . . . , T} do Sample st Penv(s|st 1, at 1), at π(a|st, θt), s t Penv(s|st, at), a t π(a|s t, θt). Sample ˆOt by first sampling a random variable T with P(T = t) = (1 γ)γt, and second obtaining T transitions by starting at s0 and taking actions according to π(a|s, θt). Compute δt, f(Ot, wt), g( ˆOt, wt, θt), and update wt+1 and θt+1 as wt+1 = Proj W {wt + αwf(Ot, wt)} , θt+1 = θt αθ 1 γ g( ˆOt, wt, θt). |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is purely theoretical and does not involve experimental training on a dataset. |
| Dataset Splits | No | The paper is purely theoretical and does not describe experimental validation or dataset splits. |
| Hardware Specification | No | The paper is purely theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is purely theoretical and does not specify any software dependencies with version numbers required for an experimental setup. |
| Experiment Setup | No | The paper is purely theoretical and does not describe any specific experimental setup details or hyperparameters. |