Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stochastic Multi-Armed Bandits with Control Variates
Authors: Arun Verma, Manjesh Kumar Hanawal
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms. |
| Researcher Affiliation | Academia | Arun Verma Department of Computer Science National University of Singapore EMAIL Manjesh K. Hanawal Department of IEOR IIT Bombay, Mumbai, India EMAIL |
| Pseudocode | Yes | UCB-CV UCB based Algorithm for MAB-CV problem 1: Input: K, Q, α > 1 2: Play each arm i [K] Q times 3: for t = QK + 1, QK + 2, . . . , do 4: i [K] : compute UCBt 1,i as given in Eq. (5) 5: Select It = argmax i [K] UCBt 1,i 6: Play arm It and observe Xt,It and associated control variates Wt,It. Increment the value of NIt(t) by one and re-estimate ˆβ NIt(t),It, ˆµc NIt(t),It and ˆνt,NIt(t) 7: end for |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository for the methodology was found. |
| Open Datasets | No | We empirically evaluate the performance of UCB-CV... on different synthetically generated problem instances. For all the instance we use we use K = 10, q = 1, and α = 2. All the experiments are repeated 100 times and cumulative regret with a 95% confidence interval (the vertical line on each curve shows the confidence interval) are shown. Details of each instance are as follows: Instance 1: The reward and associated CV of this instance have a multivariate normal distribution. The reward of each arm has two components. We treated one of the components as CV. In round t, the reward of arm i is given as follows: Xt,i = Vt,i + Wt,i, where Vt,i N(µv,i, σ2 v,i) and Wt,i N(µw,i, σ2 w,i). |
| Dataset Splits | No | The paper uses synthetically generated data and runs simulations over time. It does not describe traditional train/validation/test dataset splits as would be found in a supervised learning context. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or cloud resources) used for running experiments were provided in the paper. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers for replicating the experiment. While plots mention Matlab, this is not a dependency of the experiment itself. |
| Experiment Setup | Yes | For all the instance we use we use K = 10, q = 1, and α = 2. All the experiments are repeated 100 times |