Weighted Linear Bandits for Non-Stationary Environments
Authors: Yoan Russac, Claire Vernade, Olivier Cappé
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also illustrate the empirical performance of D-Lin UCB and compare it with recently proposed alternatives in simulated environments. This section is devoted to the evaluation of the empirical performance of D-Lin UCB. We first consider two simulated low-dimensional environments that illustrate the behavior of the algorithms when confronted to either abrupt changes or slow variations of the parameters. |
| Researcher Affiliation | Collaboration | Yoan Russac CNRS, Inria, ENS, Université PSL yoan.russac@ens.fr Claire Vernade Deepmind vernade@google.com Olivier Cappé CNRS, Inria, ENS, Université PSL olivier.cappe@cnrs.fr |
| Pseudocode | Yes | Algorithm 1: D-Lin UCB |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | For this experiment, a dataset providing a sample of 30 days of Criteo live traffic data [13] was used. |
| Dataset Splits | No | The paper describes the use of synthetic data and a real dataset but does not provide explicit details about train/validation/test splits by percentages, counts, or specific predefined splits from cited sources. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, memory specifications, or cloud computing instance types. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specific frameworks). |
| Experiment Setup | Yes | For D-Lin UCB the discount parameter is chosen as γ = 1 ( BT / d T )2/3. For SW-Lin UCB the window s length is set to l = ( d T / BT )2/3, where d = 2 in the experiment. Those values are theoretically supposed to minimize the asymptotic regret. For the Dynamic Linear UCB algorithm, the badness is estimated from τ = 200 steps, as in the experimental section of [29]. The number of rounds is set to T = 6000. with 1-subgaussian random noise and Gaussian noise of variance σ2 = 0.15. |