Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach

Authors: Shuang Wu, Ling Shi, Jun Wang, Guangjian Tian

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform numerical simulations to verify our theoretic results. In particular, we evaluate variants of PG algorithms on different settings and compare the performance optimality gap of every policy in each epoch during optimization.
Researcher Affiliation Collaboration 1Huawei Noah s Ark Lab 2Hong Kong University of Science and Technology 3University College London.
Pseudocode No The paper does not include a dedicated section for pseudocode or algorithm blocks with structured, code-like steps.
Open Source Code No The paper does not provide any links to open-source code or explicit statements about code availability.
Open Datasets No The paper references existing examples or environments like 'controlled restart process (Akbarzadeh & Mahajan, 2019)', 'binary chain example in (Nota & Thomas, 2020)', and '10-by-10 grid world [Example 3.5 (Sutton & Barto, 2018)]'. While these are specific contexts for experiments, the paper does not provide concrete access information (e.g., specific URLs, DOIs, or repository names) for these as public datasets.
Dataset Splits No The paper performs numerical simulations and shows 'optimality gaps' across 'epochs', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not specify any hardware components (e.g., CPU, GPU models) used for running the numerical simulations or experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes The stepsize for policy gradient algorithms are set to be 0.1 for all cases. The temperature τ for entropy regularization is 10.