Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning
Authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We investigate the performance of simple spectral-based matrix estimation approaches: we show that they efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error. These new results on low-rank matrix estimation make it possible to devise reinforcement learning algorithms that fully exploit the underlying low-rank structure. We provide two examples of such algorithms: a regret minimization algorithm for low-rank bandit problems, and a best policy identification algorithm for reward-free RL in low-rank MDPs. Both algorithms yield state-of-the-art performance guarantees. |
| Researcher Affiliation | Academia | Stefan Stojanovic EECS KTH, Stockholm, Sweden stesto@kth.se Yassir Jedra EECS KTH, Stockholm, Sweden jedra@kth.se Alexandre Proutiere EECS KTH, Stockholm, Sweden alepro@kth.se |
| Pseudocode | Yes | Algorithm 1: Succesive Matrix Estimation and Arm Elimination (SME-AE) |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical experiments with a specific named dataset. It discusses 'noisy observations of its entries' and 'samples of transitions of the chain' within its theoretical models, but these are abstract data concepts, not publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not specify dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers used for experiments. |
| Experiment Setup | No | The paper is theoretical and does not include specific experimental setup details such as hyperparameter values, model initialization, or training schedules. |