Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Authors: Christoph Dann, Emma Brunskill
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we derive an upper PAC bound O( |S|2|A|H2δ ) and a lower PAC bound Ω( |S||A|H2ϵ2 ln 1 δ+c) that match up to log-terms and an additional linear dependency on the number of states |S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein s inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H3. |
| Researcher Affiliation | Academia | Christoph Dann Machine Learning Department Carnegie Mellon University cdann@cdann.net Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1: UCFH: Upper-Confidence Fixed-Horizon episodic reinforcement learning algorithm |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release. |
| Open Datasets | No | The paper is theoretical and focuses on sample complexity bounds; it does not describe empirical experiments using a specific dataset. Figure 1 shows a theoretical 'class of a hard-to-learn finite horizon MDPs' for the lower bound proof, not an actual dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with data splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments requiring specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on deriving bounds and presenting an algorithm; it does not include an experimental setup with hyperparameters or training configurations. |