Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality
Authors: Audrey Huang, Nan Jiang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | For the first time, we demonstrate how policy optimization can be conducted with (only) occupancy functions for both online and offline RL, and comprehensively analyze both local and global convergence. ... As our work is the first in this line of research and theoretical in nature, for future work we plan to launch empirical investigations of our methods, especially those for optimizing general functionals. |
| Researcher Affiliation | Academia | Audrey Huang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 EMAIL Nan Jiang Department of Computer Science University of Illinois Urbana-Champaign Champaign, IL 61820 EMAIL |
| Pseudocode | Yes | Algorithm 1 OCCUPG: Online Occupancy-based Policy Gradient; Algorithm 2 OFF-OCCUPG: Offline Occupancy-based Policy Gradient; Algorithm 3 Online Occupancy-based PG for General Functionals; Algorithm 4 Maximum Likelihood Estimation; Algorithm 5 Fitted Occupancy Iteration with Smooth Clipping |
| Open Source Code | No | The NeurIPS Paper Checklist states: "The answer NA means that paper does not include experiments requiring code. ... We do not believe that Figure 1 constitutes as an experiment that requires code." |
| Open Datasets | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not involve training models on datasets. |
| Dataset Splits | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not describe experimental validation splits. |
| Hardware Specification | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not mention any specific hardware used for computations or experiments. |
| Software Dependencies | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not list any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The NeurIPS Paper Checklist explicitly states: "The answer NA means that the paper does not include experiments." The paper is theoretical and does not provide an experimental setup section with hyperparameters or training details. |