Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Structured Prediction Energy Networks
Authors: David Belanger, Andrew McCallum
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments provide impressive performance on a variety of benchmark multi-label classification tasks, demonstrate that our technique can be used to provide interpretable structure learning, and illuminate fundamental trade-offs between feedforward and iterative structured prediction. |
| Researcher Affiliation | Academia | David Belanger EMAIL Andrew Mc Callum EMAIL College of Information and Computer Sciences, University of Massachusetts Amherst |
| Pseudocode | No | The paper mentions "Appendix A.2 provides a computation graph for this architecture," but this is a graph, not pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | Table 1 compares SPENs to a variety of high-performing baselines on a selection of standard multi-label classification tasks. Dataset sizes, etc. are described in Table 4. ... In Table 2, we consider the 14-label yeast dataset (Elisseeff & Weston, 2001)... |
| Dataset Splits | Yes | For Bibtex and Delicious, we tune hyperparameters by pooling the train and test data and sampling without replacement to make a split of the same size as the original. For Bookmarks, we use the same train-dev-test split as Lin et al. (2014). ... We report hamming error, using 10-fold cross validation. |
| Hardware Specification | No | The paper states: "Prediction, both at train and test time, is performed in parallel in large minibatches on a GPU." This mentions a type of hardware (GPU) but lacks specific details such as the model, memory, or other processor specifications. |
| Software Dependencies | No | The paper describes the methods and architectures used, but does not provide specific version numbers for software libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | We seleced 15 linear measurements (rows of C1 in (5)) for Bookmarks and Bibtex, and 5 for Delicious. Section A.5 describes additional choices of hyperparameters. ... In Table 3 we compare: a linear classifier, a 3-Layer Re LU MLP with hidden units of size 64 and 16, and a SPEN with a simple linear local energy network and a 2layer global energy network with Hard Tanh activations and 4 hidden units. |