Thompson Sampling for Complex Online Problems
Authors: Aditya Gopalan, Shie Mannor, Yishay Mansour
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using particle filters for computing posterior distributions which lack an explicit closed-form, we present numerical results for the performance of Thompson sampling for subset-selection and job scheduling problems. We evaluate the performance of Thompson sampling (Algorithm 1) on two complex bandit settings (a) Playing subsets of arms with the MAX reward function, and (b) Job scheduling over machines to minimize makespan. |
| Researcher Affiliation | Academia | Aditya Gopalan ADITYA@EE.TECHNION.AC.IL Department of Electrical Engineering, Technion Israel Institute of Technology, Haifa 32000, Israel Shie Mannor SHIE@EE.TECHNION.AC.IL Department of Electrical Engineering, Technion Israel Institute of Technology, Haifa 32000, Israel Yishay Mansour MANSOUR@TAU.AC.IL School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel |
| Pseudocode | Yes | Algorithm 1 Thompson Sampling |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes synthetic experimental setups for 'Subset Plays, MAX Reward' and 'Job Scheduling' experiments, generating data based on specified parameters (e.g., 'arms reward parameters are taken to be equi-spaced in (0, 1)', 'job means taken to be equally spaced in [0, N]'). It does not mention using or providing access to any publicly available or open datasets for training. |
| Dataset Splits | No | The paper describes synthetic data generation within the experimental setup but does not specify explicit training, validation, or test dataset splits in the conventional sense for pre-existing datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and methods like 'particle filter' and 'UCB', but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We assume the setup of Section 4.2 where one plays a size-M subset in each round and observes the maximum value. The individual arms reward parameters are taken to be equi-spaced in (0, 1). It is observed that Thompson sampling outperforms standard decoupled UCB by a wide margin in the cases we consider (Figure 1, left and center). The prior is uniform over the discrete domain {0.1, 0.3, 0.5, 0.7, 0.9}N, with the arms means lying in the same domain (setting of Theorem 1). The regret is averaged across 10 runs, and the confidence intervals shown are 1 standard deviation. |