Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach
Authors: Shuang Wu, Ling Shi, Jun Wang, Guangjian Tian
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform numerical simulations to verify our theoretic results. In particular, we evaluate variants of PG algorithms on different settings and compare the performance optimality gap of every policy in each epoch during optimization. |
| Researcher Affiliation | Collaboration | 1Huawei Noah s Ark Lab 2Hong Kong University of Science and Technology 3University College London. |
| Pseudocode | No | The paper does not include a dedicated section for pseudocode or algorithm blocks with structured, code-like steps. |
| Open Source Code | No | The paper does not provide any links to open-source code or explicit statements about code availability. |
| Open Datasets | No | The paper references existing examples or environments like 'controlled restart process (Akbarzadeh & Mahajan, 2019)', 'binary chain example in (Nota & Thomas, 2020)', and '10-by-10 grid world [Example 3.5 (Sutton & Barto, 2018)]'. While these are specific contexts for experiments, the paper does not provide concrete access information (e.g., specific URLs, DOIs, or repository names) for these as public datasets. |
| Dataset Splits | No | The paper performs numerical simulations and shows 'optimality gaps' across 'epochs', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not specify any hardware components (e.g., CPU, GPU models) used for running the numerical simulations or experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | The stepsize for policy gradient algorithms are set to be 0.1 for all cases. The temperature τ for entropy regularization is 10. |