Thompson Sampling for Budgeted Multi-Armed Bandits

Authors: Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a set of numerical simulations with different rewards/costs distributions and different number of arms. The simulation results demonstrate the effectiveness of the proposed algorithm.
Researcher Affiliation Collaboration 1University of Science and Technology of China, Hefei, China 2 University of Chinese Academy of Sciences, Beijing, China 3Microsoft Research, Beijing, China
Pseudocode Yes Algorithm 1 Budgeted Thompson Sampling (BTS)
Open Source Code No The paper does not provide any explicit statement or link for open-source code.
Open Datasets No The paper uses simulated data based on Bernoulli and Multinomial distributions and does not refer to any public, accessible datasets with specific access information.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. It describes simulations with randomly chosen parameters and running experiments for 500 times.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers.
Experiment Setup Yes For comparison purpose, we implement four baseline algorithms: (1) the ϵ-first algorithm [Tran-Thanh et al., 2010] with ϵ = 0.1; (2) a variant of the PD-Bw K algorithm... ϕ(x, N) = p νx /N and ν = 0.25 log(BK); (3) the UCB-BV1 algorithm [Ding et al., 2013]; (4) a variant of the KUBE algorithm... We simulate bandits with two different distributions: one is the Bernoulli distribution (simple), and the other is the multinomial distribution (complex). Their parameters are randomly chosen. For each distribution, we simulate a 10-armed case and a 100-armed case. We then independently run the experiments for 500 times... B = {100, 200, 500, 1K, 2K, 5K, 10K, 15K, 20K, ..., 50K}.