reproducibilityindex.ai

Bandit Learning with Positive Externalities

Authors: Virag Shah, Jose Blanchet, Ramesh Johari

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Further, in Section 7 we provide simulation results to obtain quantitative insights into the relative performance of different algorithms.
Researcher Affiliation	Academia	Virag Shah Management Science and Engineering Stanford University California, USA 94305 virag@stanford.edu Jose Blanchet Management Science and Engineering Stanford University California, USA 94305 jblanche@stanford.edu Ramesh Johari Management Science and Engineering Stanford University California, USA 94305 rjohari@stanford.edu
Pseudocode	Yes	Deﬁnition 4. Balanced-Exploration (BE) Algorithm: Given T, let n = w T ln T. 1. Exploration phase: ... Deﬁnition 5. Balanced Exploration with Arm Elimination (BE-AE) Algorithm: Given T, m, and , as well as a for each a 2 A, for each time t and each arm a deﬁne: ...
Open Source Code	No	The paper does not provide any links to source code or explicitly state that source code is made available.
Open Datasets	No	We simulate our model with m = 2 arms, with externality strength = 1, arm reward parameters µ1 = 0.5 and µ2 = 0.3, and initial biases 1 = 2 = 1.
Dataset Splits	No	The paper describes simulation parameters but does not specify dataset splits (e.g., train/validation/test percentages or counts), as it describes a simulation setup rather than using pre-existing datasets with established splits.
Hardware Specification	No	The paper does not explicitly describe the hardware used for simulations, such as specific CPU/GPU models or memory.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Parameters for each algorithm. We simulate UCB(γ) with γ = 3. For Random-explore-then-commit, we set the exploration time as T (empirically, this performs signiﬁcantly better than ln T). For BE, we set w T = β ln ln T with β = 2 (see Deﬁnition 4). For BE-AE, cf. Deﬁnition 5, we recall that the upper and lower conﬁdence bounds are set as ua(t) = ˆµa(t)+p ln T Ta(t), and la(t) = ˆµa(t) p ln T Ta(t) for p = 5c 1/2 where c = mina,b2A a m(1+ b). This choice of p was set in the paper for technical reasons, but unfortunately this choice is suboptimal for ﬁnite T. The choice of p = 1/2 achieves signiﬁcantly better performance for this experimental setup. The performance is sensitive to small changes in p, as the plots illustrate when choosing p = 5/2. In contrast, in our experiments, we found that the performance of BE is relatively robust to the choice of β.