Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning

Authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate the performance of simple spectral-based matrix estimation approaches: we show that they efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error. These new results on low-rank matrix estimation make it possible to devise reinforcement learning algorithms that fully exploit the underlying low-rank structure. We provide two examples of such algorithms: a regret minimization algorithm for low-rank bandit problems, and a best policy identification algorithm for reward-free RL in low-rank MDPs. Both algorithms yield state-of-the-art performance guarantees.
Researcher Affiliation Academia Stefan Stojanovic EECS KTH, Stockholm, Sweden stesto@kth.se Yassir Jedra EECS KTH, Stockholm, Sweden jedra@kth.se Alexandre Proutiere EECS KTH, Stockholm, Sweden alepro@kth.se
Pseudocode Yes Algorithm 1: Succesive Matrix Estimation and Arm Elimination (SME-AE)
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets No The paper is theoretical and does not conduct empirical experiments with a specific named dataset. It discusses 'noisy observations of its entries' and 'samples of transitions of the chain' within its theoretical models, but these are abstract data concepts, not publicly available datasets.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments. Therefore, it does not specify dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used to run experiments.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers used for experiments.
Experiment Setup No The paper is theoretical and does not include specific experimental setup details such as hyperparameter values, model initialization, or training schedules.