On the Second-Order Convergence of Biased Policy Gradient Algorithms
Authors: Siqiao Mu, Diego Klabjan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actorcritic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution. |
| Researcher Affiliation | Academia | 1Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 2Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL. |
| Pseudocode | Yes | Algorithm 1 Biased Policy Gradient Algorithm |
| Open Source Code | No | The paper does not provide any information or links regarding open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on experiments using specific datasets, thus no public dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not report on empirical validation, therefore no training/validation/test dataset splits are specified. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments, therefore no specific software dependencies or versions are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not report on experiments, therefore no experimental setup details like hyperparameters or training settings are provided. |