reproducibilityindex.ai

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Authors: Adith Swaminathan, Thorsten Joachims

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	POEM is evaluated on several multi-label classiﬁcation problems showing substantially improved robustness and generalization performance compared to the state-of-the-art.
Researcher Affiliation	Academia	Adith Swaminathan ADITH@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA Thorsten Joachims TJ@CS.CORNELL.EDU Cornell University, Ithaca, NY 14853 USA
Pseudocode	No	The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks outlining a procedure.
Open Source Code	Yes	Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6.
Open Datasets	Yes	We conducted experiments on different multi-label datasets collected from the Lib SVM repository, with different ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2.
Dataset Splits	Yes	We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions 'scikit-learn implementation' and 'Ada Grad' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.2 and c 10 6, . . . , 1 in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. For all methods, when optimizing any objective over w, we always begin the optimization from w = 0 ( hw = uniform(Y)). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.