reproducibilityindex.ai

Reinforcement Learning in Low-rank MDPs with Density Features

Authors: Audrey Huang, Jinglin Chen, Nan Jiang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.
Researcher Affiliation	Academia	Audrey Huang * 1 Jinglin Chen * 1 Nan Jiang 1 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
Pseudocode	Yes	Algorithm 1 Fitted Occupancy Iteration with Clipping (FORC), Algorithm 2 FORC-guided Exploration (FORCE), Algorithm 3 Fitted Occupancy Iteration with Clipping and Representation Learning (FORCRL), Algorithm 4 FORCRL-guided Exploration (FORCRLE)
Open Source Code	No	The paper does not contain any statement about making code open-source or provide links to a repository.
Open Datasets	No	The paper uses "offline data" and "dataset D0:H 1" as abstract concepts in its theoretical framework and algorithms, but does not describe using any specific named or publicly accessible dataset with concrete access information.
Dataset Splits	No	This paper is theoretical and does not perform experiments that would require explicit training/validation/test dataset splits.
Hardware Specification	No	The paper focuses on theoretical contributions and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper focuses on theoretical contributions and does not mention specific software dependencies with version numbers that would be required for experimental reproduction.
Experiment Setup	No	The paper focuses on theoretical contributions and does not describe an experimental setup with specific hyperparameters or system-level training settings.