Conservative Bandits

Authors: Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvari

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results obtained in synthetic environments complement our theoretical findings.
Researcher Affiliation Academia Yifan Wu YWU12@UALBERTA.CA Roshan Shariff ROSHAN.SHARIFF@UALBERTA.CA Tor Lattimore TOR.LATTIMORE@GMAIL.COM Csaba Szepesv ari SZEPESVA@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Pseudocode Yes Algorithm 1: Conservative UCB
Open Source Code No The paper does not provide a link or explicit statement about the availability of its source code.
Open Datasets No The experiments use "simulated data" and define mean rewards directly (e.g., "µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4"). This is synthetic data, not a publicly available dataset with a link or citation for access.
Dataset Splits No The paper uses simulated data for a bandit problem, where the concept of train/validation/test splits in the traditional supervised learning sense does not directly apply. No explicit split percentages or sample counts for training, validation, or testing are provided.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries or solvers).
Experiment Setup Yes We tuned the Unbalanced MOSS algorithm with the following parameters. n = K + K / (αµ0) ; Bi = BK = ... The mean rewards in both experiments are µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4... We fix the horizon and sweep over α [0, 1]... In the second regime we fix α = 0.1 and plot the longterm average regret... Each data point is an average of N = 4000 i.i.d. samples... n = 10^4 and δ = 1/n.