On the Second-Order Convergence of Biased Policy Gradient Algorithms

Authors: Siqiao Mu, Diego Klabjan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actorcritic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution.
Researcher Affiliation Academia 1Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 2Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL.
Pseudocode Yes Algorithm 1 Biased Policy Gradient Algorithm
Open Source Code No The paper does not provide any information or links regarding open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not report on experiments using specific datasets, thus no public dataset information is provided.
Dataset Splits No The paper is theoretical and does not report on empirical validation, therefore no training/validation/test dataset splits are specified.
Hardware Specification No The paper is theoretical and does not report on experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not report on experiments, therefore no specific software dependencies or versions are mentioned.
Experiment Setup No The paper is theoretical and does not report on experiments, therefore no experimental setup details like hyperparameters or training settings are provided.