Control Variates for Slate Off-Policy Evaluation
Authors: Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with real-world recommender data as well as synthetic data validate these improvements in practice. |
| Researcher Affiliation | Collaboration | Nikos Vlassis Netflix Ashok Chandrashekar Warner Media Fernando Amat Gil Netflix Nathan Kallus Cornell University and Netflix |
| Pseudocode | No | The paper describes its methods mathematically and in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for these experiments is publicly available at https://github.com/fernandoamat/slate OPE. |
| Open Datasets | Yes | We have benchmarked the proposed estimators on the publicly available dataset MSLR-WEB30K from the Microsoft Learning to Rank Challenge (Qin and Liu, 2013). |
| Dataset Splits | No | The paper mentions splitting data into three folds (D0, D1, D2) for their estimator construction, but does not explicitly provide specific train/validation/test dataset splits for model evaluation with percentages, absolute counts, or references to predefined standard splits for reproducibility. |
| Hardware Specification | No | The main paper text does not contain specific hardware details (GPU/CPU models, memory, etc.) used for running experiments. The reproducibility checklist indicates this information is in the Supplementary Text, which is not available in the provided document. |
| Software Dependencies | No | The paper mentions general tools like 'regression tree' and 'lasso' but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In Section 6, the paper specifies details such as 'M {10, 50, 100}', 'K {5, 10, 30}', 'regression model for the target policy (tree-based or lasso)', '300 independent runs for each setting', and 'tested two metrics, NDCG (additive) and ERR (non-additive)'. Section 7 further details synthetic data parameters like 'T = 20 random reward tensors' and 'φk(ak) is drawn from a Gaussian distribution N(0.2/K, 0.01)'. |