reproducibilityindex.ai

Convergence of Actor-Critic with Multi-Layer Neural Networks

Authors: Haoxing Tian, Alex Olshevsky, Yannis Paschalidis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	The early theory of actor-critic methods considered convergence... In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice." and "The main contribution of this paper is to provide the first analysis of AC with neural networks of arbitrary depth. While replicating the earlier results of a O T 0.5 convergence rate and O m 0.5 error, our work considers a single-loop method with proportional step-sizes (sometimes called single-timescale ). We prove this result under Markov sampling and project onto a ball of constant radius around the initial condition.
Researcher Affiliation	Academia	Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky Department of Electrical and Computer Engineering Boston University Boston, MA 02215, USA {tianhx, yannisp, alexols}@bu.edu
Pseudocode	Yes	Algorithm 1 details the algorithm considered in this paper. Algorithm 1 Actor-Critic Require: Numbers of iterations T, learning rate αw and αθ, projection set W. Initialize θ0, br and w(k) such that \|br\| 1, r and every entry of w(k) is chosen from N(0, 1). Initialize the starting state-action pair s0, a0. for t {1, 2, . . . , T} do Sample st Penv(s\|st 1, at 1), at π(a\|st, θt), s t Penv(s\|st, at), a t π(a\|s t, θt). Sample ˆOt by first sampling a random variable T with P(T = t) = (1 γ)γt, and second obtaining T transitions by starting at s0 and taking actions according to π(a\|s, θt). Compute δt, f(Ot, wt), g( ˆOt, wt, θt), and update wt+1 and θt+1 as wt+1 = Proj W {wt + αwf(Ot, wt)} , θt+1 = θt αθ 1 γ g( ˆOt, wt, θt).
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	No	The paper is purely theoretical and does not involve experimental training on a dataset.
Dataset Splits	No	The paper is purely theoretical and does not describe experimental validation or dataset splits.
Hardware Specification	No	The paper is purely theoretical and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper is purely theoretical and does not specify any software dependencies with version numbers required for an experimental setup.
Experiment Setup	No	The paper is purely theoretical and does not describe any specific experimental setup details or hyperparameters.