Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
Researcher Affiliation Collaboration Andrea Tirinzoni Meta EMAIL Matteo Papini Universitat Pompeu Fabra EMAIL Ahmed Touati Meta EMAIL Alessandro Lazaric Meta EMAIL Matteo Pirotta Meta EMAIL
Pseudocode Yes Algorithm 1 BANDITSRL
Open Source Code No The paper states 'The code is available at the following URL.' but provides a placeholder 'URL' instead of a concrete link.
Open Datasets Yes The dataset-based problems statlog, magic, covertype, mushroom [34 37] are obtained from the standard multiclass-to-bandit conversion [6, 27].
Dataset Splits No The paper mentions using 'standard benchmarks' and 'dataset-based problems' but does not specify exact train/validation/test split percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running experiments. The checklist provided with the paper also states 'No' for this information.
Software Dependencies No The paper mentions 'Pytorch' in the bibliography but does not specify the version number or other software dependencies with their versions used in the experiments.
Experiment Setup Yes In all the problems the reward function is highly non-linear w.r.t. contexts and actions and we use a network composed by layers of dimension [50, 50, 50, 50, 10] and Re Lu activation to learn the representation (i.e., d = 10). For the baseline algorithms (NEURALUCB, IGW) we report the regret of the best configuration on each individual dataset, while for NN-BANDITSRL we fix the parameters across datasets (i.e., αGLRT = 5).