reproducibilityindex.ai

Learning Approximate Inference Networks for Structured Prediction

Authors: Lifu Tu, Kevin Gimpel

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 EXPERIMENTS In Sec. 7.1 we compare our approach to previous work on training SPENs for MLC. We compare accuracy and speed, ﬁnding our approach to outperform prior work. We then perform experiments with sequence labeling tasks in Sec. 7.2. Table 1: Test F1 when comparing methods on multi-label classiﬁcation datasets.
Researcher Affiliation	Academia	Lifu Tu Kevin Gimpel Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA {lifu,kgimpel}@ttic.edu
Pseudocode	No	No structured pseudocode or algorithm blocks, explicitly labeled or formatted as code-like procedures, were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a direct link to a code repository.
Open Datasets	Yes	We use the MLC datasets used by Belanger & Mc Callum (2016): Bibtex, Delicious, and Bookmarks. Dataset statistics are shown in Table 7 in the Appendix. For Twitter part-of-speech (POS) tagging, we use the annotated data from Gimpel et al. (2011) and Owoputi et al. (2013) which contains L = 25 POS tags.
Dataset Splits	Yes	For validation, we use the 500-tweet OCT27TEST set and for testing we use the 547-tweet DAILY547 test set. For Bookmarks, we use the same train/dev/test split as (Belanger & Mc Callum, 2016).
Hardware Specification	No	The paper mentions 'NVIDIA Corporation for donating GPUs used in this research' in the Acknowledgments, but does not specify any exact GPU models, CPU models, or other detailed hardware specifications for running the experiments.
Software Dependencies	No	The paper mentions using optimizers like Adam and libraries/tools like word2vec and GloVe, but does not provide specific version numbers for these software components or any other ancillary software dependencies.
Experiment Setup	Yes	We pretrain the feature networks F(x) by minimizing independent-label cross entropy for 10 epochs using Adam (Kingma & Ba, 2014) with learning rate 0.001. We tune λ (the L2 regularization strength for Θ) over the set {0.01, 0.001, 0.0001}. The classiﬁcation threshold τ is chosen from [0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75].