reproducibilityindex.ai

Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach

Authors: Shuang Wu, Ling Shi, Jun Wang, Guangjian Tian

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform numerical simulations to verify our theoretic results. In particular, we evaluate variants of PG algorithms on different settings and compare the performance optimality gap of every policy in each epoch during optimization.
Researcher Affiliation	Collaboration	1Huawei Noah s Ark Lab 2Hong Kong University of Science and Technology 3University College London.
Pseudocode	No	The paper does not include a dedicated section for pseudocode or algorithm blocks with structured, code-like steps.
Open Source Code	No	The paper does not provide any links to open-source code or explicit statements about code availability.
Open Datasets	No	The paper references existing examples or environments like 'controlled restart process (Akbarzadeh & Mahajan, 2019)', 'binary chain example in (Nota & Thomas, 2020)', and '10-by-10 grid world [Example 3.5 (Sutton & Barto, 2018)]'. While these are specific contexts for experiments, the paper does not provide concrete access information (e.g., specific URLs, DOIs, or repository names) for these as public datasets.
Dataset Splits	No	The paper performs numerical simulations and shows 'optimality gaps' across 'epochs', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not specify any hardware components (e.g., CPU, GPU models) used for running the numerical simulations or experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	The stepsize for policy gradient algorithms are set to be 0.1 for all cases. The temperature τ for entropy regularization is 10.