Conservative Bandits
Authors: Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvari
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results obtained in synthetic environments complement our theoretical findings. |
| Researcher Affiliation | Academia | Yifan Wu YWU12@UALBERTA.CA Roshan Shariff ROSHAN.SHARIFF@UALBERTA.CA Tor Lattimore TOR.LATTIMORE@GMAIL.COM Csaba Szepesv ari SZEPESVA@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada |
| Pseudocode | Yes | Algorithm 1: Conservative UCB |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of its source code. |
| Open Datasets | No | The experiments use "simulated data" and define mean rewards directly (e.g., "µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4"). This is synthetic data, not a publicly available dataset with a link or citation for access. |
| Dataset Splits | No | The paper uses simulated data for a bandit problem, where the concept of train/validation/test splits in the traditional supervised learning sense does not directly apply. No explicit split percentages or sample counts for training, validation, or testing are provided. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries or solvers). |
| Experiment Setup | Yes | We tuned the Unbalanced MOSS algorithm with the following parameters. n = K + K / (αµ0) ; Bi = BK = ... The mean rewards in both experiments are µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4... We fix the horizon and sweep over α [0, 1]... In the second regime we fix α = 0.1 and plot the longterm average regret... Each data point is an average of N = 4000 i.i.d. samples... n = 10^4 and δ = 1/n. |