Meta-Thompson Sampling
Authors: Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is complemented by empirical evaluation, which shows that Meta TS quickly adapts to the unknown prior. |
| Researcher Affiliation | Collaboration | 1Google Research 2University of Alberta 3Deep Mind. |
| Pseudocode | Yes | The pseudocode for Meta TS is presented in Algorithm 1. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | No | The paper uses synthetic experiments and does not refer to a publicly available dataset with concrete access information. The text states: "Our theoretical results are complemented by synthetic experiments..." and "We experiment with three problems." These are custom-generated data for the specific problems. |
| Dataset Splits | No | The paper describes a bandit problem with sequential interactions (m tasks, n rounds) and synthetic data generation, not a fixed dataset with traditional training, validation, and test splits. Therefore, no specific validation dataset split information is provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not specify any software names with version numbers that would be needed for reproducibility. |
| Experiment Setup | Yes | We experiment with three problems. In each problem, we have m = 20 tasks with a horizon of n = 200 rounds. All results are averaged over 100 runs, where P Q in each run. [...] The meta-prior width is σq = 0.5, the instance prior width is σ0 = 0.1, and the reward noise is σ = 1. [...] We sample arm features uniformly at random from [ 0.5, 0.5]d. |