Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Authors: Christoph Dann, Emma Brunskill
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we derive an upper PAC bound O( |S|2|A|H2δ ) and a lower PAC bound Ω( |S||A|H2ϵ2 ln 1 δ+c) that match up to log-terms and an additional linear dependency on the number of states |S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein s inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H3. |
| Researcher Affiliation | Academia | Christoph Dann Machine Learning Department Carnegie Mellon University EMAIL Emma Brunskill Computer Science Department Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1: UCFH: Upper-Confidence Fixed-Horizon episodic reinforcement learning algorithm |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release. |
| Open Datasets | No | The paper is theoretical and focuses on sample complexity bounds; it does not describe empirical experiments using a specific dataset. Figure 1 shows a theoretical 'class of a hard-to-learn finite horizon MDPs' for the lower bound proof, not an actual dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with data splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments requiring specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on deriving bounds and presenting an algorithm; it does not include an experimental setup with hyperparameters or training configurations. |