Meta-Learning Effective Exploration Strategies for Contextual Bandits

Authors: Amr Sharaf, Hal Daumé III9541-9548

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate MˆEL EE on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual bandit problems derived from classification tasks.
Researcher Affiliation Collaboration Amr Sharaf,1 Hal Daume III1,2 1 University of Maryland 2 Microsoft Research
Pseudocode Yes Algorithm 1 MˆEL EE (supervised training sets {Sm}, hypothesis class F, exploration rate µ, number of validation examples NVal, feature extractor Φ)
Open Source Code No The paper does not provide any explicit statements about releasing source code for its methodology or links to a code repository.
Open Datasets Yes The dataset we consider is the Microsoft Learning to Rank dataset, variant MSLR-10K from (Qin and Liu 2013). ... Following Bietti, Agarwal, and Langford (2018), we use a collection of 300 binary classification datasets from openml.org for evaluation.
Dataset Splits Yes Algorithm 1 MˆEL EE ... 4: partition and permute S randomly into train Tr and validation Val where |Val| = NVal
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions methods like 'Platt s scaling' and 'AggreVaTe' but does not specify software packages or libraries with version numbers.
Experiment Setup Yes In all cases, the underlying classifier f is a linear model trained with an optimizer that runs stochastic gradient descent. ... In our experiments we use only 30 fully labeled examples... In practice ( 5), we find that setting µ = 0 is optimal in aggregate... To avoid correlations between the observed query-url pairs, we group the queries by the query ID, and sample a single query from each group. ... we repeat the experiment 16 times with randomly shuffled permutations of the MSLR-10K dataset.