reproducibilityindex.ai

Conservative Bandits

Authors: Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvari

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results obtained in synthetic environments complement our theoretical ﬁndings.
Researcher Affiliation	Academia	Yifan Wu YWU12@UALBERTA.CA Roshan Shariff ROSHAN.SHARIFF@UALBERTA.CA Tor Lattimore TOR.LATTIMORE@GMAIL.COM Csaba Szepesv ari SZEPESVA@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Pseudocode	Yes	Algorithm 1: Conservative UCB
Open Source Code	No	The paper does not provide a link or explicit statement about the availability of its source code.
Open Datasets	No	The experiments use "simulated data" and define mean rewards directly (e.g., "µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4"). This is synthetic data, not a publicly available dataset with a link or citation for access.
Dataset Splits	No	The paper uses simulated data for a bandit problem, where the concept of train/validation/test splits in the traditional supervised learning sense does not directly apply. No explicit split percentages or sample counts for training, validation, or testing are provided.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries or solvers).
Experiment Setup	Yes	We tuned the Unbalanced MOSS algorithm with the following parameters. n = K + K / (αµ0) ; Bi = BK = ... The mean rewards in both experiments are µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4... We ﬁx the horizon and sweep over α [0, 1]... In the second regime we ﬁx α = 0.1 and plot the longterm average regret... Each data point is an average of N = 4000 i.i.d. samples... n = 10^4 and δ = 1/n.