Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Meta-Learning Effective Exploration Strategies for Contextual Bandits
Authors: Amr Sharaf, Hal Daumé III9541-9548
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate MˆEL EE on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual bandit problems derived from classification tasks. |
| Researcher Affiliation | Collaboration | Amr Sharaf,1 Hal Daume III1,2 1 University of Maryland 2 Microsoft Research |
| Pseudocode | Yes | Algorithm 1 MˆEL EE (supervised training sets {Sm}, hypothesis class F, exploration rate µ, number of validation examples NVal, feature extractor Φ) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for its methodology or links to a code repository. |
| Open Datasets | Yes | The dataset we consider is the Microsoft Learning to Rank dataset, variant MSLR-10K from (Qin and Liu 2013). ... Following Bietti, Agarwal, and Langford (2018), we use a collection of 300 binary classification datasets from openml.org for evaluation. |
| Dataset Splits | Yes | Algorithm 1 MˆEL EE ... 4: partition and permute S randomly into train Tr and validation Val where |Val| = NVal |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions methods like 'Platt s scaling' and 'AggreVaTe' but does not specify software packages or libraries with version numbers. |
| Experiment Setup | Yes | In all cases, the underlying classifier f is a linear model trained with an optimizer that runs stochastic gradient descent. ... In our experiments we use only 30 fully labeled examples... In practice ( 5), we find that setting µ = 0 is optimal in aggregate... To avoid correlations between the observed query-url pairs, we group the queries by the query ID, and sample a single query from each group. ... we repeat the experiment 16 times with randomly shuffled permutations of the MSLR-10K dataset. |