PAC Reinforcement Learning with Rich Observations

Authors: Akshay Krishnamurthy, Alekh Agarwal, John Langford

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.
Researcher Affiliation Collaboration Akshay Krishnamurthy University of Massachusetts, Amherst Amherst, MA, 01003 akshay@cs.umass.edu and Alekh Agarwal Microsoft Research New York, NY 10011 alekha@microsoft.com and John Langford Microsoft Research New York, NY 10011 jcl@microsoft.com
Pseudocode Yes The pseudocode for the algorithm, which we call Least Squares Value Elimination by Exploration (LSVEE), is displayed in Algorithm 1 (See also Appendix B).
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper does not conduct empirical experiments with datasets; it focuses on theoretical analysis.
Dataset Splits No The paper is theoretical and does not describe experimental validation or dataset splits.
Hardware Specification No The paper does not describe any specific hardware used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical algorithm design and analysis, and does not include details about an experimental setup or hyperparameters.