Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structured Prediction Energy Networks

Authors: David Belanger, Andrew McCallum

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments provide impressive performance on a variety of benchmark multi-label classiﬁcation tasks, demonstrate that our technique can be used to provide interpretable structure learning, and illuminate fundamental trade-offs between feedforward and iterative structured prediction.
Researcher Affiliation	Academia	David Belanger EMAIL Andrew Mc Callum EMAIL College of Information and Computer Sciences, University of Massachusetts Amherst
Pseudocode	No	The paper mentions "Appendix A.2 provides a computation graph for this architecture," but this is a graph, not pseudocode or an algorithm block.
Open Source Code	No	The paper does not include any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	Table 1 compares SPENs to a variety of high-performing baselines on a selection of standard multi-label classiﬁcation tasks. Dataset sizes, etc. are described in Table 4. ... In Table 2, we consider the 14-label yeast dataset (Elisseeff & Weston, 2001)...
Dataset Splits	Yes	For Bibtex and Delicious, we tune hyperparameters by pooling the train and test data and sampling without replacement to make a split of the same size as the original. For Bookmarks, we use the same train-dev-test split as Lin et al. (2014). ... We report hamming error, using 10-fold cross validation.
Hardware Specification	No	The paper states: "Prediction, both at train and test time, is performed in parallel in large minibatches on a GPU." This mentions a type of hardware (GPU) but lacks specific details such as the model, memory, or other processor specifications.
Software Dependencies	No	The paper describes the methods and architectures used, but does not provide specific version numbers for software libraries, frameworks, or programming languages.
Experiment Setup	Yes	We seleced 15 linear measurements (rows of C1 in (5)) for Bookmarks and Bibtex, and 5 for Delicious. Section A.5 describes additional choices of hyperparameters. ... In Table 3 we compare: a linear classiﬁer, a 3-Layer Re LU MLP with hidden units of size 64 and 16, and a SPEN with a simple linear local energy network and a 2layer global energy network with Hard Tanh activations and 4 hidden units.