Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
Researcher Affiliation Collaboration Andrea Tirinzoni Meta tirinzoni@meta.com Matteo Papini Universitat Pompeu Fabra matteo.papini@upf.edu Ahmed Touati Meta atouati@meta.com Alessandro Lazaric Meta lazaric@meta.com Matteo Pirotta Meta pirotta@meta.com
Pseudocode Yes Algorithm 1 BANDITSRL
Open Source Code No The paper states 'The code is available at the following URL.' but provides a placeholder 'URL' instead of a concrete link.
Open Datasets Yes The dataset-based problems statlog, magic, covertype, mushroom [34 37] are obtained from the standard multiclass-to-bandit conversion [6, 27].
Dataset Splits No The paper mentions using 'standard benchmarks' and 'dataset-based problems' but does not specify exact train/validation/test split percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running experiments. The checklist provided with the paper also states 'No' for this information.
Software Dependencies No The paper mentions 'Pytorch' in the bibliography but does not specify the version number or other software dependencies with their versions used in the experiments.
Experiment Setup Yes In all the problems the reward function is highly non-linear w.r.t. contexts and actions and we use a network composed by layers of dimension [50, 50, 50, 50, 10] and Re Lu activation to learn the representation (i.e., d = 10). For the baseline algorithms (NEURALUCB, IGW) we report the regret of the best configuration on each individual dataset, while for NN-BANDITSRL we fix the parameters across datasets (i.e., αGLRT = 5).