Contextual semibandits via supervised learning oracles

Authors: Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate this algorithm on two large-scale learning-to-rank datasets and compare with other contextual semibandit approaches. These experiments comprehensively demonstrate that effective exploration over a rich policy class can lead to significantly better performance than existing approaches.
Researcher Affiliation Collaboration College of Information and Computer Sciences Microsoft Research University of Massachusetts, Amherst, MA New York, NY
Pseudocode Yes Algorithm 1 VCEE (Variance-Constrained Explore-Exploit) Algorithm
Open Source Code Yes Software is available at http://github.com/akshaykr/oracle_cb.
Open Datasets Yes We used two large-scale learning-to-rank datasets: MSLR [17] and all folds of the Yahoo! Learning-to-Rank dataset [5]. [17] MSLR. Mslr: Microsoft learning to rank dataset. http://research.microsoft.com/en-us/ projects/mslr/. [5] O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In Yahoo! Learning to Rank Challenge, 2011.
Dataset Splits No The paper mentions using MSLR and Yahoo! Learning-to-Rank datasets and performing parameter tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper states that software is available online but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions).
Experiment Setup Yes For MSLR, we choose K = 10 documents per query and set L = 3, while for Yahoo!, we set K = 6 and L = 2. All algorithms make a single pass over the queries. For computational reasons, we only update t and t every 100 rounds. For VCEE, we set µt = c 1/KLT and tune c... We ran each algorithm for 10 repetitions, for each of ten logarithmically spaced parameter values. We consider three: linear functions and depth-2 and depth-5 gradient boosted regression trees (abbreviated Lin, GB2 and GB5). Both GB classes use 50 trees.