Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Authors: Julian Zimmert, Haipeng Luo, Chen-Yu Wei

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on synthetic data show that our algorithm indeed performs well over different environments.
Researcher Affiliation Academia 1Department of Computer Science, University of Copenhagen, Copenhagen, Denmark 2Department of Computer Science, University of Southern California, United States.
Pseudocode Yes Algorithm 1 FTRL with hybrid regularizer for semi-bandits
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets No The paper uses 'synthetic data' for experiments, describing its generation parameters (e.g., d=10, m=5, T=10^7, Δ=1/8) but does not provide a link or citation to a publicly available dataset.
Dataset Splits No The paper does not provide specific training, validation, or test dataset splits for its synthetic data.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms like EXP2, LOGBARRIER, COMBUCB, and THOMPSON SAMPLING but does not specify software versions for these or any other programming languages or libraries used.
Experiment Setup Yes We test the algorithms on concrete instances of the m-set problem with parameters: d = 10, m = 5, T = 10^7. Specifically the final learning rates ηt for our algorithm, EXP2 and LOGBARRIER are respectively 1/√t, 1/√t and 1/√t. We measure the performance of the algorithms by the average pseudo-regret over at least 20 runs. For COMBUCB and THOMPSON SAMPLING in the adversarial environment, we increase the number of runs to 500 and 1000 respectively due to the high variance of the pseudo-regret.