Thresholding Bandit with Optimal Aggregate Regret

Authors: Chao Tao, Saúl Blanco, Jian Peng, Yuan Zhou

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform additional experiments that due to space limitations are included in Appendix F.3. In all setups, LSA outperforms its competitors with various parameter choices. and In our experiments, we assume that each arm follows independent Bernoulli distributions with different means. To guarantee a fair comparison, we vary the total number of samples T and compare the empirical average aggregate regret on a logarithmic scale which is averaged over 5000 independent runs.
Researcher Affiliation Academia Chao Tao Computer Science Department Indiana University at Bloomington Saúl A. Blanco Computer Science Department Indiana University at Bloomington Jian Peng Computer Science Department University of Illinois at Urbana-Champaign Yuan Zhou Computer Science Department, Indiana University at Bloomington Department of ISE, University of Illinois at Urbana-Champaign
Pseudocode Yes Algorithm 1 Logarithmic-Sample Algorithm, LSA(S, θ) 1: Input: A set of arms S = [K], threshold θ 2: Initialization: pull each arm once 3: for t = K + 1 to T do 4: Pull arm it = argmin i S αTi(t 1)(b i(t 1))2 + 0.5 ln Ti(t 1) 5: For each arm i S, let di 1 if bθi(T) θ and di 0 otherwise 6: Output: (d1, . . . , d K)
Open Source Code No The paper does not provide any explicit statement or link for open-source code availability.
Open Datasets No In our experiments, we assume that each arm follows independent Bernoulli distributions with different means. To guarantee a fair comparison, we vary the total number of samples T and compare the empirical average aggregate regret on a logarithmic scale which is averaged over 5000 independent runs. We consider three different choices of {θi}i S: 1. (arithmetic progression I). K = 10; θ1:4 = 0.2 + (0 : 3) 0.05, θ5 = 0.45, θ6 = 0.55, and θ7:10 = 0.65 + (0 : 3) 0.05 (see Setup 1 in Figure 1). 2. (arithmetic progression II). K = 20; θ1:20 = 0.405 + (i 1)/100 (see Setup 2 in Figure 1). 3. (two-group setting). K = 10; θ1:5 = 0.45, and θ6:10 = 0.505 (see Setup 3 in Figure 1).
Dataset Splits No The paper describes generating data through Bernoulli distributions for a bandit problem and performing 5000 independent runs, but does not define traditional train/validation/test splits for a fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper discusses algorithms and parameters but does not list any specific software dependencies with version numbers.
Experiment Setup Yes In our experiments, we fix θ = 0.5. We notice that the choice of α in our LSA is quite robust (see Appendix F.4 for experimental results). To illustrate the performance, we fix α = 1.35 in LSA and compare it with four existing algorithms for the TBP problem under a variety of settings. Now we discuss these algorithms and their parameter settings in more details. We consider three different choices of {θi}i S: 1. (arithmetic progression I). K = 10; θ1:4 = 0.2 + (0 : 3) 0.05, θ5 = 0.45, θ6 = 0.55, and θ7:10 = 0.65 + (0 : 3) 0.05 (see Setup 1 in Figure 1). 2. (arithmetic progression II). K = 20; θ1:20 = 0.405 + (i 1)/100 (see Setup 2 in Figure 1). 3. (two-group setting). K = 10; θ1:5 = 0.45, and θ6:10 = 0.505 (see Setup 3 in Figure 1).