Dynamical Linear Bandits

Authors: Marco Mussi, Alberto Maria Metelli, Marcello Restelli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct a numerical validation on a synthetic environment and on real-world data to show the effectiveness of Dyn Lin-UCB in comparison with several baselines. and In this section, we provide numerical validations of Dyn Lin-UCB in both a synthetic scenario and a domain obtained from real-world data.
Researcher Affiliation Academia Marco Mussi 1 Alberto Maria Metelli 1 Marcello Restelli 1 and 1Politecnico di Milano, Milan, Italy.
Pseudocode Yes Algorithm 1: Dyn Lin-UCB.
Open Source Code Yes The code of the experiments can be found at https://github.com/marcomussi/DLB.
Open Datasets No We present an experimental evaluation based on realworld data coming from three web advertising platforms (Facebook, Google, and Bing), related to several campaigns for an invested budget of 5 Million EUR over 2 years. (No specific access information for the raw dataset itself, which was used to build a simulator.)
Dataset Splits No The paper describes the parameters of the synthetic environment and the generation of a simulator from real-world data, but it does not specify any explicit training, validation, or test dataset splits.
Hardware Specification Yes The code used for the results provided in this section has been run on an Intel(R) I5 8259U @ 2.30GHz CPU with 8 GB of LPDDR3 system memory.
Software Dependencies No The operating system was mac OS 12.2.1, and the experiments have been run on Python 3.9.7. (Only a programming language version is provided, no specific library versions.)
Experiment Setup Yes The experiments are presented with a regularization parameter λ P t1, log T} for the algorithms which require it (i.e., Dyn Lin-UCB, Lin-UCB, and D-Lin-UCB).9 Further information about the hyperparameters of the baselines and the adopted optimistic exploration bounds are presented in Appendix E.1. and from Appendix E.1 For AR2, the hyperparameter α, describing the correlation over time is considered equal to ρp Aq. and In the case of Exp3, the rewards are rescaled in order to make them range in r0, 1s with high probability, as follows: 4ξ , where ξ ˆ Θ ΩB 1 ρp Aq