reproducibilityindex.ai

On the Second-Order Convergence of Biased Policy Gradient Algorithms

Authors: Siqiao Mu, Diego Klabjan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actorcritic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution.
Researcher Affiliation	Academia	1Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 2Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL.
Pseudocode	Yes	Algorithm 1 Biased Policy Gradient Algorithm
Open Source Code	No	The paper does not provide any information or links regarding open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not report on experiments using specific datasets, thus no public dataset information is provided.
Dataset Splits	No	The paper is theoretical and does not report on empirical validation, therefore no training/validation/test dataset splits are specified.
Hardware Specification	No	The paper is theoretical and does not report on experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not report on experiments, therefore no specific software dependencies or versions are mentioned.
Experiment Setup	No	The paper is theoretical and does not report on experiments, therefore no experimental setup details like hyperparameters or training settings are provided.