reproducibilityindex.ai

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental results to support these theoretical ﬁndings.
Researcher Affiliation	Collaboration	Jincheng Mei Google Deep Mind jcmei@google.com Bo Dai Google Deep Mind bodai@google.com Alekh Agarwal Google Research alekhagarwal@google.com Mohammad Ghavamzadeh Amazon ghavamza@amazon.com Csaba Szepesvári Google Deep Mind University of Alberta szepi@google.com Dale Schuurmans Google Deep Mind University of Alberta daes@ualberta.ca
Pseudocode	Yes	Algorithm 1 Softmax policy gradient (PG) [...] Algorithm 2 Natural policy gradient (NPG)
Open Source Code	No	The paper does not provide any links to open-source code for the described methodology or explicitly state that code is released.
Open Datasets	No	The paper uses custom-defined examples (Example 1, 2, 3, 4, 5) with specific matrices and reward vectors, but these are not publicly available datasets in the conventional sense, nor are any links or citations provided for their access.
Dataset Splits	No	The paper does not provide explicit training/validation/test splits for the small, custom-defined examples used in the simulations.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the simulations or experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	We run Softmax PG and NPG on Example 1 with the same θ1 = (6, 8) R2. In Figure 1(a), the optimization trajectories show 85 iterations of NPG and 8.5 106 iterations of Softmax PG, both with learning rate η = 0.2. [...] The initialization is θ1 = (4, 10) , and η = 0.2. We run 150 iterations for NPG and 1.5 107 iterations for Softmax PG. [...] The initialization is θ1 = (10, 2) , and η = 0.2. We run 100 iterations for NPG and 2 106 iterations for Softmax PG.