Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
Authors: Julian Zimmert, Tor Lattimore
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We started to follow this plan, successfully improving existing minimax bounds for bandits with graph feedback and online linear optimisation for ℓp-balls with full information (the bandit setting remains a mystery). Along the way, however, we noticed a striking connection between the analysis techniques for bounding the information ratio and controlling the stability of online stochastic mirror descent (OSMD), which is a classical algorithm for online convex optimisation. Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2. |
| Researcher Affiliation | Collaboration | Julian Zimmert Deep Mind, London/ University of Copenhagen EMAIL Tor Lattimore Deep Mind, London EMAIL |
| Pseudocode | Yes | Algorithm 1: OSMD Input: A = (P, E, F) and η Initialize X1 = arg mina X F(a) for t = 1, . . . , n do Sample At PXt and observe Φt Construct: ˆℓt = E(Xt, At, Φt) Update: Xt+1 = ft(Xt, At) Algorithm 2: MTS Input: Prior ν and P Initialize X1 = E[A ] for t = 1, . . . , n do Sample At PXt and observe Φt Update: Xt+1 = Et 1[A ] |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to code repositories. |
| Open Datasets | No | The paper mentions using 'Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5)' for experiments, which describes synthetic data generation rather than a public dataset with concrete access information like a link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or references to standard splits) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | Figure 1: Comparison of INF with and without shifted loss estimators. x-axis is number of time-steps and y-axis the empirical regret estimation. η is tuned to the horizon and all experiments use Bernoulli losses with E[ℓt] = (0.45, 0.55, . . . , 0.55)T (k = 5). We repeat the experiment 100 times with error bars indicating three standard deviations. The empirical result matches our theoretical improvement of a factor 2. |