On Limited-Memory Subsampling Strategies for Bandits

Authors: Dorian Baudry, Yoan Russac, Olivier Cappé

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards.
Researcher Affiliation Academia 1Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France 2DI ENS, CNRS, Inria, ENS, Université PSL, Paris, France.
Pseudocode Yes Algorithm 1 LB-SDA
Open Source Code Yes The code for obtaining the different figures reported in the paper is available at https://github.com/YRussac/ LB-SDA.
Open Datasets No The paper uses simulated environments based on Bernoulli and Gaussian distributions, rather than pre-existing public datasets, and therefore does not provide access information for a public dataset.
Dataset Splits No The paper describes experiments on simulated bandit environments and does not specify traditional train/validation/test dataset splits. The performance is evaluated over a 'horizon T' using 'independent replications'.
Hardware Specification No The paper does not provide specific hardware details (such as GPU or CPU models, or memory amounts) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library names with version numbers like Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes To allow for fair comparison, we use for SW-LB-SDA, the same value of τ = 2 p T log(T)/ΓT that is recommended for SW-UCB (Garivier & Moulines, 2011). D-UCB uses the discount factor suggested by Garivier & Moulines (2011), 1/(1 γ) = 4 p T/ΓT . For CUSUM, α and h are tuned using suggestions from Liu et al. (2017), namely α = p ΓT /T log(T/ΓT ) and h = log(T/ΓT ).