reproducibilityindex.ai

An Analytical Update Rule for General Policy Optimization

Authors: Hepeng Li, Nicholas Clavette, Haibo He

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we present an analytical policy update rule that is independent of parametric function approximators. We prove that the update rule has a monotonic improvement guarantee and is suitable for optimizing general stochastic policies with continuous or discrete actions. The update rule provides a new theoretical foundation for policy-based RL, which traditionally restricts the policy search to a family of parametric functions, such as policy gradient (Sutton et al., 1999), deterministic policy gradient (Silver et al., 2014; Lillicrap et al., 2016), actor critic (Konda & Tsitsiklis, 1999; Degris et al., 2012), soft actor-critic (SAC) (Haarnoja et al., 2018a;b), and so on. Our update rule is derived from a closed-form solution to a trust region method using calculus of variation.
Researcher Affiliation	Academia	1Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, South Kingstown, RI, USA.
Pseudocode	No	The paper focuses on theoretical derivations and proofs, not on providing pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not mention the use of any datasets for training experiments.
Dataset Splits	No	The paper is theoretical and does not discuss validation datasets or splits.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations.