Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Authors: Julian Zimmert, Tor Lattimore

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We started to follow this plan, successfully improving existing minimax bounds for bandits with graph feedback and online linear optimisation for ℓp-balls with full information (the bandit setting remains a mystery). Along the way, however, we noticed a striking connection between the analysis techniques for bounding the information ratio and controlling the stability of online stochastic mirror descent (OSMD), which is a classical algorithm for online convex optimisation. Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2.
Researcher Affiliation Collaboration Julian Zimmert Deep Mind, London/ University of Copenhagen zimmert@di.ku.dk Tor Lattimore Deep Mind, London lattimore@google.com
Pseudocode Yes Algorithm 1: OSMD Input: A = (P, E, F) and η Initialize X1 = arg mina X F(a) for t = 1, . . . , n do Sample At PXt and observe Φt Construct: ˆℓt = E(Xt, At, Φt) Update: Xt+1 = ft(Xt, At) Algorithm 2: MTS Input: Prior ν and P Initialize X1 = E[A ] for t = 1, . . . , n do Sample At PXt and observe Φt Update: Xt+1 = Et 1[A ]
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to code repositories.
Open Datasets No The paper mentions using 'Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5)' for experiments, which describes synthetic data generation rather than a public dataset with concrete access information like a link, DOI, or formal citation.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or references to standard splits) for training, validation, or test sets.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2.