Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Control Variates for Slate Off-Policy Evaluation
Authors: Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with real-world recommender data as well as synthetic data validate these improvements in practice. |
| Researcher Affiliation | Collaboration | Nikos Vlassis Netflix Ashok Chandrashekar Warner Media Fernando Amat Gil Netflix Nathan Kallus Cornell University and Netflix |
| Pseudocode | No | The paper describes its methods mathematically and in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for these experiments is publicly available at https://github.com/fernandoamat/slate OPE. |
| Open Datasets | Yes | We have benchmarked the proposed estimators on the publicly available dataset MSLR-WEB30K from the Microsoft Learning to Rank Challenge (Qin and Liu, 2013). |
| Dataset Splits | No | The paper mentions splitting data into three folds (D0, D1, D2) for their estimator construction, but does not explicitly provide specific train/validation/test dataset splits for model evaluation with percentages, absolute counts, or references to predefined standard splits for reproducibility. |
| Hardware Specification | No | The main paper text does not contain specific hardware details (GPU/CPU models, memory, etc.) used for running experiments. The reproducibility checklist indicates this information is in the Supplementary Text, which is not available in the provided document. |
| Software Dependencies | No | The paper mentions general tools like 'regression tree' and 'lasso' but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In Section 6, the paper specifies details such as 'M {10, 50, 100}', 'K {5, 10, 30}', 'regression model for the target policy (tree-based or lasso)', '300 independent runs for each setting', and 'tested two metrics, NDCG (additive) and ERR (non-additive)'. Section 7 further details synthetic data parameters like 'T = 20 random reward tensors' and 'φk(ak) is drawn from a Gaussian distribution N(0.2/K, 0.01)'. |