Recommendations as Treatments: Debiasing Learning and Evaluation

Authors: Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our conceptual and theoretical contributions are validated in an extensive empirical evaluation. For the task of evaluating recommender systems, we show that our performance estimators can be orders-of-magnitude more accurate than standard estimators commonly used in the past (Bell et al., 2007).
Researcher Affiliation Academia Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Ithaca, NY, USA {TBS49, FA234, AS3354, NC475, TJ36}@CORNELL.EDU
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We provide an implemention of our method, as well as a new benchmark dataset, online1. 1https://www.cs.cornell.edu/ schnabts/mnar/
Open Datasets Yes ML100K Dataset. The ML100K dataset4 provides 100K MNAR ratings for 1683 movies by 944 users. 4http://grouplens.org/datasets/movielens/
Dataset Splits Yes In all experiments, we perform model selection for the regularization parameter λ and/or the rank of the factorization d via cross-validation as follows. We randomly split the observed MNAR ratings into k folds (k = 4 in all experiments), training on k 1 and evaluating on the remaining one using the IPS estimator.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cluster specifications) used for running the experiments.
Software Dependencies No The paper mentions using "Limited-memory BFGS (Byrd et al., 1995)" for optimization and refers to "A standard regularized logistic regression (Pedregosa et al., 2011)" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For MF-IPS and MF-Naive all hyperparameters (i.e., λ {10 6, ..., 1} and d {5, 10, 20, 40}) were chosen by cross-validation.