Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality
Authors: Audrey Huang, Nan Jiang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | For the first time, we demonstrate how policy optimization can be conducted with (only) occupancy functions for both online and offline RL, and comprehensively analyze both local and global convergence. ... As our work is the first in this line of research and theoretical in nature, for future work we plan to launch empirical investigations of our methods, especially those for optimizing general functionals. |
| Researcher Affiliation | Academia | Audrey Huang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 audreyh5@illinois.edu Nan Jiang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 nanjiang@illinois.edu |
| Pseudocode | Yes | Algorithm 1 OCCUPG: Online Occupancy-based Policy Gradient; Algorithm 2 OFF-OCCUPG: Offline Occupancy-based Policy Gradient; Algorithm 3 Online Occupancy-based PG for General Functionals; Algorithm 4 Maximum Likelihood Estimation; Algorithm 5 Fitted Occupancy Iteration with Smooth Clipping |
| Open Source Code | No | The NeurIPS Paper Checklist states: "The answer NA means that paper does not include experiments requiring code. ... We do not believe that Figure 1 constitutes as an experiment that requires code." |
| Open Datasets | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not involve training models on datasets. |
| Dataset Splits | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not describe experimental validation splits. |
| Hardware Specification | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not mention any specific hardware used for computations or experiments. |
| Software Dependencies | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not list any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not provide an experimental setup section with hyperparameters or training details. |