Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
Authors: Rui Yuan, Simon Shaolei Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The main focus of this paper was the theoretical analysis of NPG method. |
| Researcher Affiliation | Collaboration | Rui Yuan FAIR, Meta AI LTCI, Télécom Paris and Institut Polytechnique de Paris yy42606r@gmail.com Simon S. Du University of Washington ssdu@cs.washington.edu Robert M. Gower CCM, Flatiron Institute gowerrobert@gmail.com Alessandro Lazaric FAIR, Meta AI lazaric@meta.com Lin Xiao FAIR, Meta AI linx@meta.com |
| Pseudocode | Yes | Algorithm 1: Natural policy gradient; Algorithm 2: Q-Natural policy gradient; Algorithm 3: Sampler for: (s, a) d θ(ν) and unbiased estimate b Qs,a(θ) of Qs,a(θ); Algorithm 4: Sampler for: (s, a) d θ(ν) and unbiased estimate b As,a(θ) of As,a(θ); Algorithm 5: NPG-SGD; Algorithm 6: Q-NPG-SGD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using publicly available datasets or provide access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments that would require dataset splits for training, validation, and testing. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments that would require specific hardware for execution. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments that would require specific ancillary software with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experimental setups or hyperparameter details. |