Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Authors: Yuan Cheng, Jing Yang, Yingbin Liang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
Researcher Affiliation Academia Yuan Cheng National University of Singapore yuan.cheng@u.nus.edu Jing Yang The Pennsylvania State University yangjing@psu.edu Yingbin Liang The Ohio State University liang.889@osu.edu
Pseudocode Yes Algorithm 1 PORTAL (Policy Optimization with Represen TAtion Learning under nonstationary MDPs). Algorithm 2 E2U (Model Estimation and Exploration Policy Update). Algorithm 3 Ada-PORTAL (Adaptive Policy Optimization Represen TAtion Learning).
Open Source Code No The paper does not contain any statements about releasing code, nor does it provide any links to a code repository.
Open Datasets No This paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not conduct experiments on datasets or mention any public datasets.
Dataset Splits No This paper is theoretical and does not conduct experiments with datasets, therefore it does not mention training, validation, or test splits.
Hardware Specification No This is a theoretical paper proposing algorithms and providing theoretical guarantees. It does not describe any experimental setup that would require hardware specifications.
Software Dependencies No This is a theoretical paper and does not describe any experimental setup that would require specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper presenting algorithms and their theoretical guarantees. It does not include details about an experimental setup, such as hyperparameters or training settings.