Scalable Generalized Linear Bandits: Online Computation and Hashing

Authors: Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, Rebecca Willett

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now show our experiment results comparing GLB algorithms and hash-amenable algorithms. GLB Algorithms We compare GLOC with two different algorithms: UCB-GLM [28] and Online Learning for Logit Model (OL2M) [41].8 For each trial, we draw θ Rd and N arms (X) uniformly at random from the unit sphere. We set d = 10 and Xt = X, t 1. ... We plot the cumulative regret under the logit model in Figure 2(a). All three methods perform similarly, and we do not find any statistically significant difference based on paired t test.
Researcher Affiliation Academia Kwang-Sung Jun UW-Madison kjun@discovery.wisc.edu Aniruddha Bhargava UW-Madison aniruddha@wisc.edu Robert Nowak UW-Madison rdnowak@wisc.edu Rebecca Willett UW-Madison willett@discovery.wisc.edu
Pseudocode Yes Algorithm 1 GLOC
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes generating synthetic data for experiments ('We draw θ Rd and N arms (X) uniformly at random from the unit sphere.') and does not use a publicly available or open dataset with provided access information.
Dataset Splits No The paper describes a bandit problem setup with generated data and repeated trials, but does not specify training, validation, and test dataset splits in the conventional sense for reproducibility.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') that would be needed for replication.
Experiment Setup Yes For each trial, we draw θ Rd and N arms (X) uniformly at random from the unit sphere. We set d = 10 and Xt = X, t 1. ... For OL2M we set the squared radius γt = c log(det(Zt)/det(Z1)), where c is a tuning parameter. For UCB-GLM, we set the radius as α = cd log t. For GLOC, we replace βONS t with c Pt s=1 g2 s||xs||2 A 1 s . While parameter tuning in practice is nontrivial, for the sake of comparison we tune c {101, 100.5, . . . , 10 3} and report the best one. We perform 40 trials up to time T = 3000 for each method and compute confidence bounds on the regret.