reproducibilityindex.ai

Representation Learning for Online and Offline RL in Low-rank MDPs

Authors: Masatoshi Uehara, Xuezhou Zhang, Wen Sun

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efﬁcient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and ofﬂine RL settings. For the online setting, operating with the same computational oracles used in FLAMBE(Agarwal et al., 2020b) -the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Conﬁdence Bound driven REPresentation learning for RL, which signiﬁcantly improves the sample complexity from e O(A9d7/(ϵ10(1 γ)22)) for FLAMBE to e O(d4A2/(ϵ2(1 γ)5)) with d being the rank of the transition matrix (or dimension of the ground truth representation), A being the number of actions, and γ being the discount factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the ofﬂine RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the ofﬂine data distribution.
Researcher Affiliation	Academia	Masatoshi Uehara Department of Computer Science Cornell University, Ithaca, NY 14850, USA mu223@cornell.edu Xuezhou Zhang Department of Electrical and Computer Engineering Princeton University, NJ 08544,USA xz7392@princeton.edu Wen Sun Department of Computer Science, Cornell University, Ithaca, NY 14850, USA ws455@cornell.edu
Pseudocode	Yes	Algorithm 1 UCB-driven representation learning, exploration, and exploitation (REP-UCB) and Algorithm 2 LCB-driven Representation Learning in ofﬂine RL (REP-LCB).
Open Source Code	No	The paper does not contain any explicit statements about the release of source code for the methodology or links to a code repository.
Open Datasets	No	The paper describes the theoretical framework for online and offline RL in low-rank MDPs, including discussions of data format (e.g., 'quadruples: D = {s(i), a(i), r(i), s (i)}n i=1') but does not refer to any specific, publicly available datasets by name, link, or citation.
Dataset Splits	No	The paper is theoretical and focuses on sample complexity bounds and algorithm design. It does not describe empirical experiments, and thus, no details on training, validation, or test dataset splits are provided.
Hardware Specification	No	The paper is a theoretical work focusing on algorithms and sample complexity. It does not describe any empirical experiments or the hardware used to run them.
Software Dependencies	No	The paper describes theoretical algorithms and their properties. It does not mention any specific software components or libraries with version numbers that would be required to reproduce experiments.
Experiment Setup	No	The paper presents theoretical algorithms (Algorithm 1 and 2) with abstract input parameters (e.g., 'Regularizer λn, parameter αn, Models M'). These are not concrete hyperparameter values or system-level training settings for an empirical experiment.