reproducibilityindex.ai

PAC Reinforcement Learning with Rich Observations

Authors: Akshay Krishnamurthy, Alekh Agarwal, John Langford

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justiﬁcation for reinforcement learning with function approximation.
Researcher Affiliation	Collaboration	Akshay Krishnamurthy University of Massachusetts, Amherst Amherst, MA, 01003 akshay@cs.umass.edu and Alekh Agarwal Microsoft Research New York, NY 10011 alekha@microsoft.com and John Langford Microsoft Research New York, NY 10011 jcl@microsoft.com
Pseudocode	Yes	The pseudocode for the algorithm, which we call Least Squares Value Elimination by Exploration (LSVEE), is displayed in Algorithm 1 (See also Appendix B).
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper does not conduct empirical experiments with datasets; it focuses on theoretical analysis.
Dataset Splits	No	The paper is theoretical and does not describe experimental validation or dataset splits.
Hardware Specification	No	The paper does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper focuses on theoretical algorithm design and analysis, and does not include details about an experimental setup or hyperparameters.