Thresholding Bandits with Augmented UCB
Authors: Subhojyoti Mukherjee, Naveen Kolar Purushothama, Nandan Sudarsanam, Balaraman Ravindran
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive simulation experiments to validate the performance of Aug UCB. |
| Researcher Affiliation | Academia | Subhojyoti Mukherjee1,Naveen Kolar Purushothama2,Nandan Sudarsanam3,Balaraman Ravindran4 1,4Department of Computer Science & Engineering, Indian Institute of Technology Madras 2Department of Electrical Engineering, Indian Institute of Technology Madras 3Department of Management Studies, Indian Institute of Technology Madras |
| Pseudocode | Yes | Algorithm 1 Aug UCB |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes simulation experiments where reward distributions are generated (e.g., 'Gaussian with means r1:4 = 0.2 + (0 : 3) 0.05'), but does not use or provide a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes simulation experiments for a multi-armed bandit problem, which does not involve traditional train/validation/test dataset splits. Performance is evaluated by tracking error percentage over time in repeated runs. |
| Hardware Specification | No | The paper describes simulation experiments but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) used for the experiments. |
| Experiment Setup | Yes | Across all experiments consists of K = 100 arms (indexed i = 1, 2, , 100) of which Sτ = {6, 7, , 10}, where we have fixed τ = 0.5. In all the experiments, each algorithm is run independently for 10000 time-steps. At every time-step, the output set, ˆSτ, suggested by each algorithm is recorded; the output is counted as an error if ˆSτ = Sτ. In Figure 1, for each experiment, we have reported the percentage of error incurred by the different algorithms as a function of time; Error percentage is obtained by repeating each experiment independently for 500 iterations, and then respectively computing the fraction of errors. The details of the considered experiments are as follows. Experiment-1: The reward distributions are Gaussian with means r1:4 = 0.2 + (0 : 3) 0.05, r5 = 0.45, r6 = 0.55, r7:10 = 0.65 + (0 : 3) 0.05 and r11:100 = 0.4. The corresponding variances are σ2 1:5 = 0.5 and σ2 6:10 = 0.6, while σ2 11:100 is chosen independently and uniform in the interval [0.38, 0.42]. |