Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Analytical Update Rule for General Policy Optimization
Authors: Hepeng Li, Nicholas Clavette, Haibo He
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present an analytical policy update rule that is independent of parametric function approximators. We prove that the update rule has a monotonic improvement guarantee and is suitable for optimizing general stochastic policies with continuous or discrete actions. The update rule provides a new theoretical foundation for policy-based RL, which traditionally restricts the policy search to a family of parametric functions, such as policy gradient (Sutton et al., 1999), deterministic policy gradient (Silver et al., 2014; Lillicrap et al., 2016), actor critic (Konda & Tsitsiklis, 1999; Degris et al., 2012), soft actor-critic (SAC) (Haarnoja et al., 2018a;b), and so on. Our update rule is derived from a closed-form solution to a trust region method using calculus of variation. |
| Researcher Affiliation | Academia | 1Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, South Kingstown, RI, USA. |
| Pseudocode | No | The paper focuses on theoretical derivations and proofs, not on providing pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not mention the use of any datasets for training experiments. |
| Dataset Splits | No | The paper is theoretical and does not discuss validation datasets or splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. |