Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Authors: Ayush Sekhari, Christoph Dann, Mehryar Mohri, Yishay Mansour, Karthik Sridharan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide an algorithm for this setting whose error is bounded in terms of the rank d of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of e O (H4d K3d log | |)/"2# where H is the length of episodes, K is the number of actions and " > 0 is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions. |
| Researcher Affiliation | Collaboration | Christoph Dann Google Research cdann@cdann.net Yishay Mansour Google Research & Tel Aviv University mansour.yishay@gmail.com Mehryar Mohri Google & Courant Institute mohri@google.com Ayush Sekhari Cornell University as3663@cornell.edu Karthik Sridharan Cornell University ks999@cornell.edu |
| Pseudocode | Yes | Algorithm 1 Policy search algorithm Input: horizon H, rank d, number of episodes n, finite policy class [...] Algorithm 2 Value estimation by autoregressive extrapolation |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes a theoretical framework and algorithms for Reinforcement Learning but does not refer to or provide access information for any specific publicly available or open datasets for training. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments that would involve training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental setup or hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide specific experimental setup details such as hyperparameter values or training configurations. |