Stochastic Multi-Armed Bandits with Control Variates
Authors: Arun Verma, Manjesh Kumar Hanawal
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms. |
| Researcher Affiliation | Academia | Arun Verma Department of Computer Science National University of Singapore arun@comp.nus.edu.sg Manjesh K. Hanawal Department of IEOR IIT Bombay, Mumbai, India mhanawal@iitb.ac.in |
| Pseudocode | Yes | UCB-CV UCB based Algorithm for MAB-CV problem 1: Input: K, Q, α > 1 2: Play each arm i [K] Q times 3: for t = QK + 1, QK + 2, . . . , do 4: i [K] : compute UCBt 1,i as given in Eq. (5) 5: Select It = argmax i [K] UCBt 1,i 6: Play arm It and observe Xt,It and associated control variates Wt,It. Increment the value of NIt(t) by one and re-estimate ˆβ NIt(t),It, ˆµc NIt(t),It and ˆνt,NIt(t) 7: end for |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository for the methodology was found. |
| Open Datasets | No | We empirically evaluate the performance of UCB-CV... on different synthetically generated problem instances. For all the instance we use we use K = 10, q = 1, and α = 2. All the experiments are repeated 100 times and cumulative regret with a 95% confidence interval (the vertical line on each curve shows the confidence interval) are shown. Details of each instance are as follows: Instance 1: The reward and associated CV of this instance have a multivariate normal distribution. The reward of each arm has two components. We treated one of the components as CV. In round t, the reward of arm i is given as follows: Xt,i = Vt,i + Wt,i, where Vt,i N(µv,i, σ2 v,i) and Wt,i N(µw,i, σ2 w,i). |
| Dataset Splits | No | The paper uses synthetically generated data and runs simulations over time. It does not describe traditional train/validation/test dataset splits as would be found in a supervised learning context. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or cloud resources) used for running experiments were provided in the paper. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers for replicating the experiment. While plots mention Matlab, this is not a dependency of the experiment itself. |
| Experiment Setup | Yes | For all the instance we use we use K = 10, q = 1, and α = 2. All the experiments are repeated 100 times |