Meta-Learning Effective Exploration Strategies for Contextual Bandits
Authors: Amr Sharaf, Hal Daumé III9541-9548
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate MˆEL EE on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual bandit problems derived from classification tasks. |
| Researcher Affiliation | Collaboration | Amr Sharaf,1 Hal Daume III1,2 1 University of Maryland 2 Microsoft Research |
| Pseudocode | Yes | Algorithm 1 MˆEL EE (supervised training sets {Sm}, hypothesis class F, exploration rate µ, number of validation examples NVal, feature extractor Φ) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for its methodology or links to a code repository. |
| Open Datasets | Yes | The dataset we consider is the Microsoft Learning to Rank dataset, variant MSLR-10K from (Qin and Liu 2013). ... Following Bietti, Agarwal, and Langford (2018), we use a collection of 300 binary classification datasets from openml.org for evaluation. |
| Dataset Splits | Yes | Algorithm 1 MˆEL EE ... 4: partition and permute S randomly into train Tr and validation Val where |Val| = NVal |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions methods like 'Platt s scaling' and 'AggreVaTe' but does not specify software packages or libraries with version numbers. |
| Experiment Setup | Yes | In all cases, the underlying classifier f is a linear model trained with an optimizer that runs stochastic gradient descent. ... In our experiments we use only 30 fully labeled examples... In practice ( 5), we find that setting µ = 0 is optimal in aggregate... To avoid correlations between the observed query-url pairs, we group the queries by the query ID, and sample a single query from each group. ... we repeat the experiment 16 times with randomly shuffled permutations of the MSLR-10K dataset. |