Bandit Learning with Positive Externalities
Authors: Virag Shah, Jose Blanchet, Ramesh Johari
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, in Section 7 we provide simulation results to obtain quantitative insights into the relative performance of different algorithms. |
| Researcher Affiliation | Academia | Virag Shah Management Science and Engineering Stanford University California, USA 94305 virag@stanford.edu Jose Blanchet Management Science and Engineering Stanford University California, USA 94305 jblanche@stanford.edu Ramesh Johari Management Science and Engineering Stanford University California, USA 94305 rjohari@stanford.edu |
| Pseudocode | Yes | Definition 4. Balanced-Exploration (BE) Algorithm: Given T, let n = w T ln T. 1. Exploration phase: ... Definition 5. Balanced Exploration with Arm Elimination (BE-AE) Algorithm: Given T, m, and , as well as a for each a 2 A, for each time t and each arm a define: ... |
| Open Source Code | No | The paper does not provide any links to source code or explicitly state that source code is made available. |
| Open Datasets | No | We simulate our model with m = 2 arms, with externality strength = 1, arm reward parameters µ1 = 0.5 and µ2 = 0.3, and initial biases 1 = 2 = 1. |
| Dataset Splits | No | The paper describes simulation parameters but does not specify dataset splits (e.g., train/validation/test percentages or counts), as it describes a simulation setup rather than using pre-existing datasets with established splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for simulations, such as specific CPU/GPU models or memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Parameters for each algorithm. We simulate UCB(γ) with γ = 3. For Random-explore-then-commit, we set the exploration time as T (empirically, this performs significantly better than ln T). For BE, we set w T = β ln ln T with β = 2 (see Definition 4). For BE-AE, cf. Definition 5, we recall that the upper and lower confidence bounds are set as ua(t) = ˆµa(t)+p ln T Ta(t), and la(t) = ˆµa(t) p ln T Ta(t) for p = 5c 1/2 where c = mina,b2A a m(1+ b). This choice of p was set in the paper for technical reasons, but unfortunately this choice is suboptimal for finite T. The choice of p = 1/2 achieves significantly better performance for this experimental setup. The performance is sensitive to small changes in p, as the plots illustrate when choosing p = 5/2. In contrast, in our experiments, we found that the performance of BE is relatively robust to the choice of β. |