Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
Authors: Julian Zimmert, Tor Lattimore
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We started to follow this plan, successfully improving existing minimax bounds for bandits with graph feedback and online linear optimisation for ℓp-balls with full information (the bandit setting remains a mystery). Along the way, however, we noticed a striking connection between the analysis techniques for bounding the information ratio and controlling the stability of online stochastic mirror descent (OSMD), which is a classical algorithm for online convex optimisation. Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2. |
| Researcher Affiliation | Collaboration | Julian Zimmert Deep Mind, London/ University of Copenhagen zimmert@di.ku.dk Tor Lattimore Deep Mind, London lattimore@google.com |
| Pseudocode | Yes | Algorithm 1: OSMD Input: A = (P, E, F) and η Initialize X1 = arg mina X F(a) for t = 1, . . . , n do Sample At PXt and observe Φt Construct: ˆℓt = E(Xt, At, Φt) Update: Xt+1 = ft(Xt, At) Algorithm 2: MTS Input: Prior ν and P Initialize X1 = E[A ] for t = 1, . . . , n do Sample At PXt and observe Φt Update: Xt+1 = Et 1[A ] |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to code repositories. |
| Open Datasets | No | The paper mentions using 'Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5)' for experiments, which describes synthetic data generation rather than a public dataset with concrete access information like a link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or references to standard splits) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2. |