Control Variates for Slate Off-Policy Evaluation

Authors: Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with real-world recommender data as well as synthetic data validate these improvements in practice.
Researcher Affiliation Collaboration Nikos Vlassis Netflix Ashok Chandrashekar Warner Media Fernando Amat Gil Netflix Nathan Kallus Cornell University and Netflix
Pseudocode No The paper describes its methods mathematically and in prose but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code for these experiments is publicly available at https://github.com/fernandoamat/slate OPE.
Open Datasets Yes We have benchmarked the proposed estimators on the publicly available dataset MSLR-WEB30K from the Microsoft Learning to Rank Challenge (Qin and Liu, 2013).
Dataset Splits No The paper mentions splitting data into three folds (D0, D1, D2) for their estimator construction, but does not explicitly provide specific train/validation/test dataset splits for model evaluation with percentages, absolute counts, or references to predefined standard splits for reproducibility.
Hardware Specification No The main paper text does not contain specific hardware details (GPU/CPU models, memory, etc.) used for running experiments. The reproducibility checklist indicates this information is in the Supplementary Text, which is not available in the provided document.
Software Dependencies No The paper mentions general tools like 'regression tree' and 'lasso' but does not provide specific software dependencies with version numbers.
Experiment Setup Yes In Section 6, the paper specifies details such as 'M {10, 50, 100}', 'K {5, 10, 30}', 'regression model for the target policy (tree-based or lasso)', '300 independent runs for each setting', and 'tested two metrics, NDCG (additive) and ERR (non-additive)'. Section 7 further details synthetic data parameters like 'T = 20 random reward tensors' and 'φk(ak) is drawn from a Gaussian distribution N(0.2/K, 0.01)'.