reproducibilityindex.ai

Thompson Sampling for Budgeted Multi-Armed Bandits

Authors: Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a set of numerical simulations with different rewards/costs distributions and different number of arms. The simulation results demonstrate the effectiveness of the proposed algorithm.
Researcher Affiliation	Collaboration	1University of Science and Technology of China, Hefei, China 2 University of Chinese Academy of Sciences, Beijing, China 3Microsoft Research, Beijing, China
Pseudocode	Yes	Algorithm 1 Budgeted Thompson Sampling (BTS)
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code.
Open Datasets	No	The paper uses simulated data based on Bernoulli and Multinomial distributions and does not refer to any public, accessible datasets with specific access information.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits. It describes simulations with randomly chosen parameters and running experiments for 500 times.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers.
Experiment Setup	Yes	For comparison purpose, we implement four baseline algorithms: (1) the ϵ-ﬁrst algorithm [Tran-Thanh et al., 2010] with ϵ = 0.1; (2) a variant of the PD-Bw K algorithm... ϕ(x, N) = p νx /N and ν = 0.25 log(BK); (3) the UCB-BV1 algorithm [Ding et al., 2013]; (4) a variant of the KUBE algorithm... We simulate bandits with two different distributions: one is the Bernoulli distribution (simple), and the other is the multinomial distribution (complex). Their parameters are randomly chosen. For each distribution, we simulate a 10-armed case and a 100-armed case. We then independently run the experiments for 500 times... B = {100, 200, 500, 1K, 2K, 5K, 10K, 15K, 20K, ..., 50K}.