An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Authors: Shi Dong, Benjamin Van Roy

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish new bounds that depend instead on a notion of rate-distortion. Among other things, this allows us to recover through information-theoretic arguments a near-optimal bound for the linear bandit. We also offer a bound for the logistic bandit that dramatically improves on the best previously available, though this bound depends on an information-theoretic statistic that we have only been able to quantify via computation. and Theorem 1. Let {Θk}K k=1 be any partition of Θ such that for any k = 1, . . . , K and θ, θ Θk, d(θ, θ ) ϵ. Let ψ be defined as in (4) and let θ t and θt satisfy the conditions in Proposition 2. We have Bayes Regret(T; πTS) q Γ I(θ ; ψ) T + ϵ T,
Researcher Affiliation Academia Shi Dong Stanford University sdong15@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode No The paper describes theoretical concepts and proofs without including any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and defines problem settings (linear bandits, logistic bandits) rather than using specific, named datasets. The 'simulated information ratio values' in Figure 1 are results from a computation to support a conjecture, not data from a publicly accessible dataset.
Dataset Splits No The paper is theoretical and does not describe experimental validation on datasets with explicit splits.
Hardware Specification No The paper does not specify any hardware details for its theoretical analysis or the computational results presented in Figure 1.
Software Dependencies No The paper does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not provide specific experimental setup details, hyperparameters, or training configurations.