Weighted Linear Bandits for Non-Stationary Environments

Authors: Yoan Russac, Claire Vernade, Olivier Cappé

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also illustrate the empirical performance of D-Lin UCB and compare it with recently proposed alternatives in simulated environments. This section is devoted to the evaluation of the empirical performance of D-Lin UCB. We first consider two simulated low-dimensional environments that illustrate the behavior of the algorithms when confronted to either abrupt changes or slow variations of the parameters.
Researcher Affiliation Collaboration Yoan Russac CNRS, Inria, ENS, Université PSL yoan.russac@ens.fr Claire Vernade Deepmind vernade@google.com Olivier Cappé CNRS, Inria, ENS, Université PSL olivier.cappe@cnrs.fr
Pseudocode Yes Algorithm 1: D-Lin UCB
Open Source Code No The paper does not provide any explicit statements about the release of source code or links to a code repository for the described methodology.
Open Datasets Yes For this experiment, a dataset providing a sample of 30 days of Criteo live traffic data [13] was used.
Dataset Splits No The paper describes the use of synthetic data and a real dataset but does not provide explicit details about train/validation/test splits by percentages, counts, or specific predefined splits from cited sources.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, memory specifications, or cloud computing instance types.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specific frameworks).
Experiment Setup Yes For D-Lin UCB the discount parameter is chosen as γ = 1 ( BT / d T )2/3. For SW-Lin UCB the window s length is set to l = ( d T / BT )2/3, where d = 2 in the experiment. Those values are theoretically supposed to minimize the asymptotic regret. For the Dynamic Linear UCB algorithm, the badness is estimated from τ = 200 steps, as in the experimental section of [29]. The number of rounds is set to T = 6000. with 1-subgaussian random noise and Gaussian noise of variance σ2 = 0.15.