reproducibilityindex.ai

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Authors: Christoph Dann, Emma Brunskill

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we derive an upper PAC bound O( \|S\|2\|A\|H2δ ) and a lower PAC bound Ω( \|S\|\|A\|H2ϵ2 ln 1 δ+c) that match up to log-terms and an additional linear dependency on the number of states \|S\|. The lower bound is the ﬁrst of its kind for this setting. Our upper bound leverages Bernstein s inequality to improve on previous bounds for episodic ﬁnite-horizon MDPs which have a time-horizon dependency of at least H3.
Researcher Affiliation	Academia	Christoph Dann Machine Learning Department Carnegie Mellon University cdann@cdann.net Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu
Pseudocode	Yes	Algorithm 1: UCFH: Upper-Confidence Fixed-Horizon episodic reinforcement learning algorithm
Open Source Code	No	The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release.
Open Datasets	No	The paper is theoretical and focuses on sample complexity bounds; it does not describe empirical experiments using a specific dataset. Figure 1 shows a theoretical 'class of a hard-to-learn finite horizon MDPs' for the lower bound proof, not an actual dataset used for training.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with data splits for validation.
Hardware Specification	No	The paper is theoretical and does not describe empirical experiments requiring specific hardware specifications.
Software Dependencies	No	The paper is theoretical and does not mention any software dependencies with specific version numbers for replication.
Experiment Setup	No	The paper is theoretical and focuses on deriving bounds and presenting an algorithm; it does not include an experimental setup with hyperparameters or training configurations.