Bandit Learning with Positive Externalities

Authors: Virag Shah, Jose Blanchet, Ramesh Johari

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, in Section 7 we provide simulation results to obtain quantitative insights into the relative performance of different algorithms.
Researcher Affiliation Academia Virag Shah Management Science and Engineering Stanford University California, USA 94305 virag@stanford.edu Jose Blanchet Management Science and Engineering Stanford University California, USA 94305 jblanche@stanford.edu Ramesh Johari Management Science and Engineering Stanford University California, USA 94305 rjohari@stanford.edu
Pseudocode Yes Definition 4. Balanced-Exploration (BE) Algorithm: Given T, let n = w T ln T. 1. Exploration phase: ... Definition 5. Balanced Exploration with Arm Elimination (BE-AE) Algorithm: Given T, m, and , as well as a for each a 2 A, for each time t and each arm a define: ...
Open Source Code No The paper does not provide any links to source code or explicitly state that source code is made available.
Open Datasets No We simulate our model with m = 2 arms, with externality strength = 1, arm reward parameters µ1 = 0.5 and µ2 = 0.3, and initial biases 1 = 2 = 1.
Dataset Splits No The paper describes simulation parameters but does not specify dataset splits (e.g., train/validation/test percentages or counts), as it describes a simulation setup rather than using pre-existing datasets with established splits.
Hardware Specification No The paper does not explicitly describe the hardware used for simulations, such as specific CPU/GPU models or memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Parameters for each algorithm. We simulate UCB(γ) with γ = 3. For Random-explore-then-commit, we set the exploration time as T (empirically, this performs significantly better than ln T). For BE, we set w T = β ln ln T with β = 2 (see Definition 4). For BE-AE, cf. Definition 5, we recall that the upper and lower confidence bounds are set as ua(t) = ˆµa(t)+p ln T Ta(t), and la(t) = ˆµa(t) p ln T Ta(t) for p = 5c 1/2 where c = mina,b2A a m(1+ b). This choice of p was set in the paper for technical reasons, but unfortunately this choice is suboptimal for finite T. The choice of p = 1/2 achieves significantly better performance for this experimental setup. The performance is sensitive to small changes in p, as the plots illustrate when choosing p = 5/2. In contrast, in our experiments, we found that the performance of BE is relatively robust to the choice of β.