Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Authors: Adith Swaminathan, Thorsten Joachims

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental POEM is evaluated on several multi-label classification problems showing substantially improved robustness and generalization performance compared to the state-of-the-art.
Researcher Affiliation Academia Adith Swaminathan ADITH@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA Thorsten Joachims TJ@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks outlining a procedure.
Open Source Code Yes Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6.
Open Datasets Yes We conducted experiments on different multi-label datasets collected from the Lib SVM repository, with different ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2.
Dataset Splits Yes We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'scikit-learn implementation' and 'Ada Grad' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.2 and c 10 6, . . . , 1 in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. For all methods, when optimizing any objective over w, we always begin the optimization from w = 0 ( hw = uniform(Y)). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.