Thompson Sampling for Complex Online Problems

Authors: Aditya Gopalan, Shie Mannor, Yishay Mansour

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using particle filters for computing posterior distributions which lack an explicit closed-form, we present numerical results for the performance of Thompson sampling for subset-selection and job scheduling problems. We evaluate the performance of Thompson sampling (Algorithm 1) on two complex bandit settings (a) Playing subsets of arms with the MAX reward function, and (b) Job scheduling over machines to minimize makespan.
Researcher Affiliation Academia Aditya Gopalan ADITYA@EE.TECHNION.AC.IL Department of Electrical Engineering, Technion Israel Institute of Technology, Haifa 32000, Israel Shie Mannor SHIE@EE.TECHNION.AC.IL Department of Electrical Engineering, Technion Israel Institute of Technology, Haifa 32000, Israel Yishay Mansour MANSOUR@TAU.AC.IL School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
Pseudocode Yes Algorithm 1 Thompson Sampling
Open Source Code No The paper does not provide any explicit statements about the release of source code for the described methodology or a link to a code repository.
Open Datasets No The paper describes synthetic experimental setups for 'Subset Plays, MAX Reward' and 'Job Scheduling' experiments, generating data based on specified parameters (e.g., 'arms reward parameters are taken to be equi-spaced in (0, 1)', 'job means taken to be equally spaced in [0, N]'). It does not mention using or providing access to any publicly available or open datasets for training.
Dataset Splits No The paper describes synthetic data generation within the experimental setup but does not specify explicit training, validation, or test dataset splits in the conventional sense for pre-existing datasets.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions algorithms and methods like 'particle filter' and 'UCB', but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We assume the setup of Section 4.2 where one plays a size-M subset in each round and observes the maximum value. The individual arms reward parameters are taken to be equi-spaced in (0, 1). It is observed that Thompson sampling outperforms standard decoupled UCB by a wide margin in the cases we consider (Figure 1, left and center). The prior is uniform over the discrete domain {0.1, 0.3, 0.5, 0.7, 0.9}N, with the arms means lying in the same domain (setting of Theorem 1). The regret is averaged across 10 runs, and the confidence intervals shown are 1 standard deviation.