Contextual Bandits with Large Action Spaces: Made Practical

Authors: Yinglun Zhu, Dylan J Foster, John Langford, Paul Mineiro

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines.
Researcher Affiliation Collaboration 1University of Wisconsin-Madison 2Microsoft Research NYC.
Pseudocode Yes Algorithm 1 Spanner Greedy, Algorithm 2 Spanner IGW, Algorithm 3 Reweighted Spanner, Algorithm 4 IGW-Arg Max, Algorithm 5 Approximate Barycentric Spanner (Awerbuch & Kleinberg, 2008)
Open Source Code Yes Code to reproduce all results is available at https://github.com/pmineiro/linrepcb.
Open Datasets Yes We conduct experiments on three datasets, whose details are summarized in Table 1. oneshotwiki (Singh et al., 2012; Vasnetsov, 2018) is a named-entity recognition task... amazon-3m (Bhatia et al., 2016) is an extreme multi-label dataset...
Dataset Splits No The paper describes an online learning setting and refers to 'progressive-validation reward' but does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes CPU timings use batch size 1 on an Azure STANDARD D4 V2 machine. GPU timings use batch size 1024 on an Azure STANDARD NC6S V2 (Nvidia P100-based) machine.
Software Dependencies No The paper mentions software like 'PyTorch' and the 'sentence transformers' package but does not provide specific version numbers for these dependencies.
Experiment Setup No The paper states that hyperparameters are optimized using random search and mentions the use of Adam optimizer, but it does not provide the specific values for these hyperparameters (e.g., learning rate, batch size) or the parameters of the optimizer.