Dynamical Linear Bandits
Authors: Marco Mussi, Alberto Maria Metelli, Marcello Restelli
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct a numerical validation on a synthetic environment and on real-world data to show the effectiveness of Dyn Lin-UCB in comparison with several baselines. and In this section, we provide numerical validations of Dyn Lin-UCB in both a synthetic scenario and a domain obtained from real-world data. |
| Researcher Affiliation | Academia | Marco Mussi 1 Alberto Maria Metelli 1 Marcello Restelli 1 and 1Politecnico di Milano, Milan, Italy. |
| Pseudocode | Yes | Algorithm 1: Dyn Lin-UCB. |
| Open Source Code | Yes | The code of the experiments can be found at https://github.com/marcomussi/DLB. |
| Open Datasets | No | We present an experimental evaluation based on realworld data coming from three web advertising platforms (Facebook, Google, and Bing), related to several campaigns for an invested budget of 5 Million EUR over 2 years. (No specific access information for the raw dataset itself, which was used to build a simulator.) |
| Dataset Splits | No | The paper describes the parameters of the synthetic environment and the generation of a simulator from real-world data, but it does not specify any explicit training, validation, or test dataset splits. |
| Hardware Specification | Yes | The code used for the results provided in this section has been run on an Intel(R) I5 8259U @ 2.30GHz CPU with 8 GB of LPDDR3 system memory. |
| Software Dependencies | No | The operating system was mac OS 12.2.1, and the experiments have been run on Python 3.9.7. (Only a programming language version is provided, no specific library versions.) |
| Experiment Setup | Yes | The experiments are presented with a regularization parameter λ P t1, log T} for the algorithms which require it (i.e., Dyn Lin-UCB, Lin-UCB, and D-Lin-UCB).9 Further information about the hyperparameters of the baselines and the adopted optimistic exploration bounds are presented in Appendix E.1. and from Appendix E.1 For AR2, the hyperparameter α, describing the correlation over time is considered equal to ρp Aq. and In the case of Exp3, the rewards are rescaled in order to make them range in r0, 1s with high probability, as follows: 4ξ , where ξ ˆ Θ ΩB 1 ρp Aq |