Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Contextual Bandit Bake-off

Authors: Alberto Bietti, Alekh Agarwal, John Langford

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. ... The main objective of our work is an evaluation of practical methods that are relevant to practitioners.
Researcher Affiliation Collaboration Alberto Bietti EMAIL Center for Data Science, New York University, New York, NY Alekh Agarwal EMAIL Microsoft Research, Redmond, WA John Langford EMAIL Microsoft Research, New York, NY
Pseudocode Yes Algorithm 1 Generic contextual bandit algorithm Algorithm 2 ϵ-greedy Algorithm 3 Bag / Online BTS Algorithm 4 Cover Algorithm 5 Reg CB
Open Source Code Yes The evaluation code is available at https://github.com/albietz/cb_bakeoff. All methods presented in this section are available in Vowpal Wabbit. For reproducibility purposes, the precise version of VW used to run these experiments is available at https://github.com/albietz/vowpal_wabbit/tree/bakeoff.
Open Datasets Yes We consider a large collection of over 500 datasets with varying characteristics and various cost structures, including multiclass, multilabel and more general cost-sensitive datasets with real-valued costs. ... We consider a collection of 516 multiclass classification datasets from the openml.org platform... We consider 5 multilabel datasets from the Lib SVM website... Microsoft Learning to Rank dataset, variant MSLR-30K at https://www.microsoft.com/en-us/research/project/mslr/, and the Yahoo! Learning to Rank Challenge V2.0, variant C14B at https://webscope.sandbox.yahoo.com/catalog.php?datatype=c. The datasets we used can be accessed at https://www.openml.org/d/<id>, with id in the following list: [list of IDs]
Dataset Splits No Because of the online setup, we consider one or more fixed, shuffled orderings of each dataset. The datasets widely vary in noise levels, and number of actions, features, examples etc., allowing us to model varying difficulties in CB problems. ... The performance of method A on a dataset of size n is measured by the progressive validation loss (Blum et al., 1999): T t=1 ct(at), where at is the action chosen by the algorithm on the t-th example, and ct the true cost vector.
Hardware Specification No The paper does not explicitly mention specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It refers to 'Vowpal Wabbit' as an online learning system and discusses online learning in production systems, but not the experimental hardware.
Software Dependencies No All of our experiments are based on the online learning system Vowpal Wabbit which has already been successfully used in production systems (Agarwal et al., 2016). ... For reproducibility purposes, the precise version of VW used to run these experiments is available at https://github.com/albietz/vowpal_wabbit/tree/bakeoff. While a link to a specific branch of Vowpal Wabbit is provided, the paper does not explicitly state a version number (e.g., 8.9.0) for Vowpal Wabbit or any other ancillary software dependencies in the text.
Experiment Setup Yes We ran each method on every dataset with different choices of algorithm-specific hyperparameters, learning rates, reductions, and loss encodings. Details are given in Appendix C.1. Appendix C.1. Algorithms and Hyperparameters: We ran each method on every dataset with the following hyperparameters: algorithm-specific hyperparameters, shown in Table 9. 9 choices of learning rates, on a logarithmic grid from 0.001 to 10... 3 choices of reductions: IPS, DR and IWR... 3 choices of loss encodings: 0/1, -1/0 and 9/10 (see Eq. (7)).