Reinforcement Learning in Low-rank MDPs with Density Features

Authors: Audrey Huang, Jinglin Chen, Nan Jiang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.
Researcher Affiliation Academia Audrey Huang * 1 Jinglin Chen * 1 Nan Jiang 1 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
Pseudocode Yes Algorithm 1 Fitted Occupancy Iteration with Clipping (FORC), Algorithm 2 FORC-guided Exploration (FORCE), Algorithm 3 Fitted Occupancy Iteration with Clipping and Representation Learning (FORCRL), Algorithm 4 FORCRL-guided Exploration (FORCRLE)
Open Source Code No The paper does not contain any statement about making code open-source or provide links to a repository.
Open Datasets No The paper uses "offline data" and "dataset D0:H 1" as abstract concepts in its theoretical framework and algorithms, but does not describe using any specific named or publicly accessible dataset with concrete access information.
Dataset Splits No This paper is theoretical and does not perform experiments that would require explicit training/validation/test dataset splits.
Hardware Specification No The paper focuses on theoretical contributions and does not mention any specific hardware used for experiments.
Software Dependencies No The paper focuses on theoretical contributions and does not mention specific software dependencies with version numbers that would be required for experimental reproduction.
Experiment Setup No The paper focuses on theoretical contributions and does not describe an experimental setup with specific hyperparameters or system-level training settings.