PAC Reinforcement Learning with Rich Observations
Authors: Akshay Krishnamurthy, Alekh Agarwal, John Langford
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justiļ¬cation for reinforcement learning with function approximation. |
| Researcher Affiliation | Collaboration | Akshay Krishnamurthy University of Massachusetts, Amherst Amherst, MA, 01003 akshay@cs.umass.edu and Alekh Agarwal Microsoft Research New York, NY 10011 alekha@microsoft.com and John Langford Microsoft Research New York, NY 10011 jcl@microsoft.com |
| Pseudocode | Yes | The pseudocode for the algorithm, which we call Least Squares Value Elimination by Exploration (LSVEE), is displayed in Algorithm 1 (See also Appendix B). |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper does not conduct empirical experiments with datasets; it focuses on theoretical analysis. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation or dataset splits. |
| Hardware Specification | No | The paper does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical algorithm design and analysis, and does not include details about an experimental setup or hyperparameters. |