Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PAC Reinforcement Learning with Rich Observations
Authors: Akshay Krishnamurthy, Alekh Agarwal, John Langford
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation. |
| Researcher Affiliation | Collaboration | Akshay Krishnamurthy University of Massachusetts, Amherst Amherst, MA, 01003 EMAIL and Alekh Agarwal Microsoft Research New York, NY 10011 EMAIL and John Langford Microsoft Research New York, NY 10011 EMAIL |
| Pseudocode | Yes | The pseudocode for the algorithm, which we call Least Squares Value Elimination by Exploration (LSVEE), is displayed in Algorithm 1 (See also Appendix B). |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper does not conduct empirical experiments with datasets; it focuses on theoretical analysis. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation or dataset splits. |
| Hardware Specification | No | The paper does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical algorithm design and analysis, and does not include details about an experimental setup or hyperparameters. |