Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Conservative Bandits
Authors: Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvari
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results obtained in synthetic environments complement our theoretical findings. |
| Researcher Affiliation | Academia | Yifan Wu EMAIL Roshan Shariff EMAIL Tor Lattimore EMAIL Csaba Szepesv ari EMAIL Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada |
| Pseudocode | Yes | Algorithm 1: Conservative UCB |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of its source code. |
| Open Datasets | No | The experiments use "simulated data" and define mean rewards directly (e.g., "µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4"). This is synthetic data, not a publicly available dataset with a link or citation for access. |
| Dataset Splits | No | The paper uses simulated data for a bandit problem, where the concept of train/validation/test splits in the traditional supervised learning sense does not directly apply. No explicit split percentages or sample counts for training, validation, or testing are provided. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries or solvers). |
| Experiment Setup | Yes | We tuned the Unbalanced MOSS algorithm with the following parameters. n = K + K / (αµ0) ; Bi = BK = ... The mean rewards in both experiments are µ0 = 0.5, µ1 = 0.6, µ2 = µ3 = µ4 = 0.4... We fix the horizon and sweep over α [0, 1]... In the second regime we fix α = 0.1 and plot the longterm average regret... Each data point is an average of N = 4000 i.i.d. samples... n = 10^4 and δ = 1/n. |