Off-policy evaluation for slate recommendation

Authors: Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, Imed Zitouni

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learningto-rank task, where it achieves competitive performance.
Researcher Affiliation Collaboration Adith Swaminathan Microsoft Research, Redmond adswamin@microsoft.com Akshay Krishnamurthy University of Massachusetts, Amherst akshay@cs.umass.edu Alekh Agarwal Microsoft Research, New York alekha@microsoft.com Miroslav Dudík Microsoft Research, New York mdudik@microsoft.com John Langford Microsoft Research, New York jcl@microsoft.com Damien Jose Microsoft, Redmond dajose@microsoft.com Imed Zitouni Microsoft, Redmond izitouni@microsoft.com
Pseudocode No The paper describes procedures and methods but does not include a dedicated 'Pseudocode' or 'Algorithm' section or block.
Open Source Code Yes All of our code is available online.3 (Footnote 3 points to: https://github.com/adith387/slates_semisynth_expts)
Open Datasets Yes Our semi-synthetic evaluation uses labeled data from the Microsoft Learning to Rank Challenge dataset [30] (MSLR-WEB30K) to create a contextual bandit instance. [30] Tao Qin and Tie-Yan Liu. Introducing LETOR 4.0 datasets. ar Xiv:1306.2597, 2013.
Dataset Splits Yes We use the provided 5-fold split and always train on bandit data collected by uniform logging from four folds, while evaluating with supervised data on the fifth.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, memory amounts, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions software like 'lasso regression models', 'regression tree models', 'gradient boosted regression trees', and 'Lambda MART', but does not specify their version numbers.
Experiment Setup Yes Both PI-OPT and SUP train gradient boosted regression trees (with 1000 trees, each with up to 70 leaves).