Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Self-Normalized Estimator for Counterfactual Learning
Authors: Adith Swaminathan, Thorsten Joachims
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the empirical effectiveness of Norm POEM on several multi-label classification problems, finding that it consistently outperforms the conventional estimator. |
| Researcher Affiliation | Academia | Adith Swaminathan Department of Computer Science Cornell University EMAIL Thorsten Joachims Department of Computer Science Cornell University EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Software implementing Norm-POEM is available at http://www.cs.cornell.edu/~adith/POEM. |
| Open Datasets | Yes | The experiment setup uses supervised datasets for multi-label classification from the Lib SVM repository. In these datasets, the inputs x Rp. The predictions y {0, 1}q are bitvectors indicating the labels assigned to x. The datasets have a range of features p, labels q and instances n: Name p(# features) q(# labels) ntrain ntest Scene 294 6 1211 1196 Yeast 103 14 1500 917 TMC 30438 22 21519 7077 LYRL 47236 4 23149 781265 |
| Dataset Splits | Yes | Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1]. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific computational resources. |
| Software Dependencies | No | The paper mentions that 'CRF is implemented by scikit-learn [27]', but it does not specify the version number of scikit-learn or any other software dependencies. |
| Experiment Setup | Yes | Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1]. and To simulate a bandit feedback dataset D, we use a CRF with default hyper-parameters trained on 5% of the supervised dataset as h0, and replay the training data 4 times and collect sampled labels from h0. and Since the choice of optimization method could be a confounder, we use L-BFGS for all methods and experiments. |