The Self-Normalized Estimator for Counterfactual Learning
Authors: Adith Swaminathan, Thorsten Joachims
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the empirical effectiveness of Norm POEM on several multi-label classification problems, finding that it consistently outperforms the conventional estimator. |
| Researcher Affiliation | Academia | Adith Swaminathan Department of Computer Science Cornell University adith@cs.cornell.edu Thorsten Joachims Department of Computer Science Cornell University tj@cs.cornell.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Software implementing Norm-POEM is available at http://www.cs.cornell.edu/~adith/POEM. |
| Open Datasets | Yes | The experiment setup uses supervised datasets for multi-label classification from the Lib SVM repository. In these datasets, the inputs x Rp. The predictions y {0, 1}q are bitvectors indicating the labels assigned to x. The datasets have a range of features p, labels q and instances n: Name p(# features) q(# labels) ntrain ntest Scene 294 6 1211 1196 Yeast 103 14 1500 917 TMC 30438 22 21519 7077 LYRL 47236 4 23149 781265 |
| Dataset Splits | Yes | Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1]. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific computational resources. |
| Software Dependencies | No | The paper mentions that 'CRF is implemented by scikit-learn [27]', but it does not specify the version number of scikit-learn or any other software dependencies. |
| Experiment Setup | Yes | Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1]. and To simulate a bandit feedback dataset D, we use a CRF with default hyper-parameters trained on 5% of the supervised dataset as h0, and replay the training data 4 times and collect sampled labels from h0. and Since the choice of optimization method could be a confounder, we use L-BFGS for all methods and experiments. |