Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Authors: Ayush Sekhari, Christoph Dann, Mehryar Mohri, Yishay Mansour, Karthik Sridharan
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide an algorithm for this setting whose error is bounded in terms of the rank d of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of e O (H4d K3d log | |)/"2# where H is the length of episodes, K is the number of actions and " > 0 is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions. |
| Researcher Affiliation | Collaboration | Christoph Dann Google Research EMAIL Yishay Mansour Google Research & Tel Aviv University EMAIL Mehryar Mohri Google & Courant Institute EMAIL Ayush Sekhari Cornell University EMAIL Karthik Sridharan Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Policy search algorithm Input: horizon H, rank d, number of episodes n, finite policy class [...] Algorithm 2 Value estimation by autoregressive extrapolation |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes a theoretical framework and algorithms for Reinforcement Learning but does not refer to or provide access information for any specific publicly available or open datasets for training. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments that would involve training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental setup or hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide specific experimental setup details such as hyperparameter values or training configurations. |