Reinforcement Learning in Low-rank MDPs with Density Features
Authors: Audrey Huang, Jinglin Chen, Nan Jiang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set. |
| Researcher Affiliation | Academia | Audrey Huang * 1 Jinglin Chen * 1 Nan Jiang 1 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA. |
| Pseudocode | Yes | Algorithm 1 Fitted Occupancy Iteration with Clipping (FORC), Algorithm 2 FORC-guided Exploration (FORCE), Algorithm 3 Fitted Occupancy Iteration with Clipping and Representation Learning (FORCRL), Algorithm 4 FORCRL-guided Exploration (FORCRLE) |
| Open Source Code | No | The paper does not contain any statement about making code open-source or provide links to a repository. |
| Open Datasets | No | The paper uses "offline data" and "dataset D0:H 1" as abstract concepts in its theoretical framework and algorithms, but does not describe using any specific named or publicly accessible dataset with concrete access information. |
| Dataset Splits | No | This paper is theoretical and does not perform experiments that would require explicit training/validation/test dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical contributions and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper focuses on theoretical contributions and does not mention specific software dependencies with version numbers that would be required for experimental reproduction. |
| Experiment Setup | No | The paper focuses on theoretical contributions and does not describe an experimental setup with specific hyperparameters or system-level training settings. |