Contextual semibandits via supervised learning oracles
Authors: Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate this algorithm on two large-scale learning-to-rank datasets and compare with other contextual semibandit approaches. These experiments comprehensively demonstrate that effective exploration over a rich policy class can lead to significantly better performance than existing approaches. |
| Researcher Affiliation | Collaboration | College of Information and Computer Sciences Microsoft Research University of Massachusetts, Amherst, MA New York, NY |
| Pseudocode | Yes | Algorithm 1 VCEE (Variance-Constrained Explore-Exploit) Algorithm |
| Open Source Code | Yes | Software is available at http://github.com/akshaykr/oracle_cb. |
| Open Datasets | Yes | We used two large-scale learning-to-rank datasets: MSLR [17] and all folds of the Yahoo! Learning-to-Rank dataset [5]. [17] MSLR. Mslr: Microsoft learning to rank dataset. http://research.microsoft.com/en-us/ projects/mslr/. [5] O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In Yahoo! Learning to Rank Challenge, 2011. |
| Dataset Splits | No | The paper mentions using MSLR and Yahoo! Learning-to-Rank datasets and performing parameter tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states that software is available online but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | For MSLR, we choose K = 10 documents per query and set L = 3, while for Yahoo!, we set K = 6 and L = 2. All algorithms make a single pass over the queries. For computational reasons, we only update t and t every 100 rounds. For VCEE, we set µt = c 1/KLT and tune c... We ran each algorithm for 10 repetitions, for each of ten logarithmically spaced parameter values. We consider three: linear functions and depth-2 and depth-5 gradient boosted regression trees (abbreviated Lin, GB2 and GB5). Both GB classes use 50 trees. |