Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks
Authors: Amirmohammad Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted training of SPENs in three settings with different reward functions: 1) Multi-label classification with the reward function defined as F1 score between predicted labels and target labels. 2) Citation-field extraction with a human-written reward function. 3) Shape parsing with a task-specific reward function. and Table 1.B shows the performance of SG-SPEN, R-SPEN, and DVN on this task. |
| Researcher Affiliation | Academia | Amirmohammad Rooshenas, Dongxu Zhang, Gopal Sharma, and Andrew Mc Callum College of Information of Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 EMAIL |
| Pseudocode | Yes | Algorithm 1 Search-guided training of SPENs |
| Open Source Code | No | The paper does not provide any specific link to source code or state that code is available in supplementary materials or upon request. |
| Open Datasets | Yes | We consider the task of multi-label classification on Bibtex dataset with 159 labels and 1839 input variables and Bookmarks dataset with 208 labels and 2150 input variables. and We used the Cora citation dataset (Seymore et al., 1999) including 100 labeled examples as the validation set and another 100 labeled examples for the test set. and We generated 2000 different image-program pairs based on Sharma et al. (2018), including 1400 training pair, 300 pairs for validation set, and 300 pairs for the test set. |
| Dataset Splits | Yes | We used the Cora citation dataset (Seymore et al., 1999) including 100 labeled examples as the validation set and another 100 labeled examples for the test set. and We generated 2000 different image-program pairs based on Sharma et al. (2018), including 1400 training pair, 300 pairs for validation set, and 300 pairs for the test set. |
| Hardware Specification | Yes | Iterative beam search with beam size of ten gets about 39.0% accuracy, however, the inference time takes more than a minute per test example on a 10-core CPU. |
| Software Dependencies | No | The paper mentions general software components like "deep neural networks" and specific algorithms, but does not provide version numbers for any libraries, frameworks, or software dependencies. |
| Experiment Setup | No | The paper mentions hyperparameters like "c is the regularization hyper-parameter" and "δ > 0 is the search margin", and "where > 1 is a task-dependent scalar", and refers to "Appendix E [which] includes a detailed description of baselines and hyper-parameters", but does not provide their specific numerical values or detailed training configurations within the main text. |