Thompson Sampling for Budgeted Multi-Armed Bandits
Authors: Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a set of numerical simulations with different rewards/costs distributions and different number of arms. The simulation results demonstrate the effectiveness of the proposed algorithm. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, Hefei, China 2 University of Chinese Academy of Sciences, Beijing, China 3Microsoft Research, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Budgeted Thompson Sampling (BTS) |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code. |
| Open Datasets | No | The paper uses simulated data based on Bernoulli and Multinomial distributions and does not refer to any public, accessible datasets with specific access information. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It describes simulations with randomly chosen parameters and running experiments for 500 times. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers. |
| Experiment Setup | Yes | For comparison purpose, we implement four baseline algorithms: (1) the ϵ-first algorithm [Tran-Thanh et al., 2010] with ϵ = 0.1; (2) a variant of the PD-Bw K algorithm... ϕ(x, N) = p νx /N and ν = 0.25 log(BK); (3) the UCB-BV1 algorithm [Ding et al., 2013]; (4) a variant of the KUBE algorithm... We simulate bandits with two different distributions: one is the Bernoulli distribution (simple), and the other is the multinomial distribution (complex). Their parameters are randomly chosen. For each distribution, we simulate a 10-armed case and a 100-armed case. We then independently run the experiments for 500 times... B = {100, 200, 500, 1K, 2K, 5K, 10K, 15K, 20K, ..., 50K}. |