Meta-Thompson Sampling

Authors: Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory is complemented by empirical evaluation, which shows that Meta TS quickly adapts to the unknown prior.
Researcher Affiliation Collaboration 1Google Research 2University of Alberta 3Deep Mind.
Pseudocode Yes The pseudocode for Meta TS is presented in Algorithm 1.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets No The paper uses synthetic experiments and does not refer to a publicly available dataset with concrete access information. The text states: "Our theoretical results are complemented by synthetic experiments..." and "We experiment with three problems." These are custom-generated data for the specific problems.
Dataset Splits No The paper describes a bandit problem with sequential interactions (m tasks, n rounds) and synthetic data generation, not a fixed dataset with traditional training, validation, and test splits. Therefore, no specific validation dataset split information is provided.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper does not specify any software names with version numbers that would be needed for reproducibility.
Experiment Setup Yes We experiment with three problems. In each problem, we have m = 20 tasks with a horizon of n = 200 rounds. All results are averaged over 100 runs, where P Q in each run. [...] The meta-prior width is σq = 0.5, the instance prior width is σ0 = 0.1, and the reward noise is σ = 1. [...] We sample arm features uniformly at random from [ 0.5, 0.5]d.