On Limited-Memory Subsampling Strategies for Bandits
Authors: Dorian Baudry, Yoan Russac, Olivier Cappé
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards. |
| Researcher Affiliation | Academia | 1Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France 2DI ENS, CNRS, Inria, ENS, Université PSL, Paris, France. |
| Pseudocode | Yes | Algorithm 1 LB-SDA |
| Open Source Code | Yes | The code for obtaining the different figures reported in the paper is available at https://github.com/YRussac/ LB-SDA. |
| Open Datasets | No | The paper uses simulated environments based on Bernoulli and Gaussian distributions, rather than pre-existing public datasets, and therefore does not provide access information for a public dataset. |
| Dataset Splits | No | The paper describes experiments on simulated bandit environments and does not specify traditional train/validation/test dataset splits. The performance is evaluated over a 'horizon T' using 'independent replications'. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU or CPU models, or memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library names with version numbers like Python 3.8, PyTorch 1.9) required to replicate the experiments. |
| Experiment Setup | Yes | To allow for fair comparison, we use for SW-LB-SDA, the same value of τ = 2 p T log(T)/ΓT that is recommended for SW-UCB (Garivier & Moulines, 2011). D-UCB uses the discount factor suggested by Garivier & Moulines (2011), 1/(1 γ) = 4 p T/ΓT . For CUSUM, α and h are tuned using suggestions from Liu et al. (2017), namely α = p ΓT /T log(T/ΓT ) and h = log(T/ΓT ). |