Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Authors: Adith Swaminathan, Thorsten Joachims
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | POEM is evaluated on several multi-label classification problems showing substantially improved robustness and generalization performance compared to the state-of-the-art. |
| Researcher Affiliation | Academia | Adith Swaminathan ADITH@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA Thorsten Joachims TJ@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks outlining a procedure. |
| Open Source Code | Yes | Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6. |
| Open Datasets | Yes | We conducted experiments on different multi-label datasets collected from the Lib SVM repository, with different ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2. |
| Dataset Splits | Yes | We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'scikit-learn implementation' and 'Ada Grad' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.2 and c 10 6, . . . , 1 in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. For all methods, when optimizing any objective over w, we always begin the optimization from w = 0 ( hw = uniform(Y)). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence. |