Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality

Authors: Audrey Huang, Nan Jiang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For the first time, we demonstrate how policy optimization can be conducted with (only) occupancy functions for both online and offline RL, and comprehensively analyze both local and global convergence. ... As our work is the first in this line of research and theoretical in nature, for future work we plan to launch empirical investigations of our methods, especially those for optimizing general functionals.
Researcher Affiliation Academia Audrey Huang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 audreyh5@illinois.edu Nan Jiang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 nanjiang@illinois.edu
Pseudocode Yes Algorithm 1 OCCUPG: Online Occupancy-based Policy Gradient; Algorithm 2 OFF-OCCUPG: Offline Occupancy-based Policy Gradient; Algorithm 3 Online Occupancy-based PG for General Functionals; Algorithm 4 Maximum Likelihood Estimation; Algorithm 5 Fitted Occupancy Iteration with Smooth Clipping
Open Source Code No The NeurIPS Paper Checklist states: "The answer NA means that paper does not include experiments requiring code. ... We do not believe that Figure 1 constitutes as an experiment that requires code."
Open Datasets No The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not involve training models on datasets.
Dataset Splits No The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not describe experimental validation splits.
Hardware Specification No The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not mention any specific hardware used for computations or experiments.
Software Dependencies No The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not list any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not provide an experimental setup section with hyperparameters or training details.