Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Authors: Christoph Dann, Emma Brunskill

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we derive an upper PAC bound O( |S|2|A|H2δ ) and a lower PAC bound Ω( |S||A|H2ϵ2 ln 1 δ+c) that match up to log-terms and an additional linear dependency on the number of states |S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein s inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H3.
Researcher Affiliation Academia Christoph Dann Machine Learning Department Carnegie Mellon University cdann@cdann.net Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu
Pseudocode Yes Algorithm 1: UCFH: Upper-Confidence Fixed-Horizon episodic reinforcement learning algorithm
Open Source Code No The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release.
Open Datasets No The paper is theoretical and focuses on sample complexity bounds; it does not describe empirical experiments using a specific dataset. Figure 1 shows a theoretical 'class of a hard-to-learn finite horizon MDPs' for the lower bound proof, not an actual dataset used for training.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with data splits for validation.
Hardware Specification No The paper is theoretical and does not describe empirical experiments requiring specific hardware specifications.
Software Dependencies No The paper is theoretical and does not mention any software dependencies with specific version numbers for replication.
Experiment Setup No The paper is theoretical and focuses on deriving bounds and presenting an algorithm; it does not include an experimental setup with hyperparameters or training configurations.