reproducibilityindex.ai

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
Researcher Affiliation	Collaboration	Andrea Tirinzoni Meta tirinzoni@meta.com Matteo Papini Universitat Pompeu Fabra matteo.papini@upf.edu Ahmed Touati Meta atouati@meta.com Alessandro Lazaric Meta lazaric@meta.com Matteo Pirotta Meta pirotta@meta.com
Pseudocode	Yes	Algorithm 1 BANDITSRL
Open Source Code	No	The paper states 'The code is available at the following URL.' but provides a placeholder 'URL' instead of a concrete link.
Open Datasets	Yes	The dataset-based problems statlog, magic, covertype, mushroom [34 37] are obtained from the standard multiclass-to-bandit conversion [6, 27].
Dataset Splits	No	The paper mentions using 'standard benchmarks' and 'dataset-based problems' but does not specify exact train/validation/test split percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running experiments. The checklist provided with the paper also states 'No' for this information.
Software Dependencies	No	The paper mentions 'Pytorch' in the bibliography but does not specify the version number or other software dependencies with their versions used in the experiments.
Experiment Setup	Yes	In all the problems the reward function is highly non-linear w.r.t. contexts and actions and we use a network composed by layers of dimension [50, 50, 50, 50, 10] and Re Lu activation to learn the representation (i.e., d = 10). For the baseline algorithms (NEURALUCB, IGW) we report the regret of the best configuration on each individual dataset, while for NN-BANDITSRL we fix the parameters across datasets (i.e., αGLRT = 5).