reproducibilityindex.ai

Information-Theoretic Confidence Bounds for Reinforcement Learning

Authors: Xiuyuan Lu, Benjamin Van Roy

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and conﬁdence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent s uncertainty over the environment and quantiﬁes the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.
Researcher Affiliation	Academia	Xiuyuan Lu Stanford University lxy@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode	Yes	Algorithm 1 Thompson Sampling 1: Input: prior p 2: for ℓ= 1, 2, . . . , L do 3: Sample: ˆθℓ p 4: Act: Aℓ= arg max a A 5: Observe: Yℓ,Aℓ 6: Update: p P(θ \|p, Aℓ, Yℓ,Aℓ) 7: end for
Open Source Code	No	The paper does not provide any specific links to source code repositories or explicitly state that the code for the described methodology is publicly available.
Open Datasets	No	This paper is theoretical and focuses on analysis and derivation of bounds rather than empirical evaluation on specific datasets. Therefore, no concrete access information for a publicly available or open dataset is provided.
Dataset Splits	No	This is a theoretical paper that focuses on analytical bounds and framework development rather than empirical experiments. As such, no specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing is provided.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments.
Software Dependencies	No	The paper does not provide any specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper describes theoretical frameworks and their applications to different problem types (linear bandits, tabular MDPs, factored MDPs) but does not include concrete hyperparameter values, training configurations, or system-level settings for a practical experimental setup.