SparseMAP: Differentiable Sparse Structured Inference
Authors: Vlad Niculae, Andre Martins, Mathieu Blondel, Claire Cardie
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the Sparse MAP losses against the commonly used CRF and structured SVM losses. The task we focus on is non-projective dependency parsing: a structured output task consisting of predicting the directed tree of grammatical dependencies between words in a sentence (Jurafsky & Martin, 2018, Ch. 14). We use annotated Universal Dependency data (Nivre et al., 2016), as used in the Co NLL 2017 shared task (Zeman et al., 2017). |
| Researcher Affiliation | Collaboration | 1Cornell University, Ithaca, NY 2Unbabel & Instituto de Telecomunicações, Lisbon, Portugal 3NTT Communication Science Laboratories, Kyoto, Japan. Correspondence to: Vlad Niculae <vlad@vene.ro>, André F. T. Martins <andre.martins@unbabel.com>, Mathieu Blondel <mathieu@mblondel.org>, Claire Cardie <cardie@cs.cornell.edu>. |
| Pseudocode | No | The paper describes algorithms such as Conditional gradient and Active set method, stating 'We provide a full description of both methods in Appendix A.' However, no pseudocode or algorithm blocks are present in the provided main paper text. |
| Open Source Code | Yes | 1 General-purpose dynet and pytorch implementations available at https://github.com/vene/sparsemap. |
| Open Datasets | Yes | We use annotated Universal Dependency data (Nivre et al., 2016), as used in the Co NLL 2017 shared task (Zeman et al., 2017). We evaluate the two models alongside the softmax baseline on the SNLI (Bowman et al., 2015) and Multi NLI (Williams et al., 2018) datasets. |
| Dataset Splits | Yes | All models are trained by SGD, with 0.9 learning rate decay at epochs when the validation accuracy is not the best seen. We split the Multi NLI matched validation set into equal validation and test sets; for SNLI we use the provided split. |
| Hardware Specification | No | The paper mentions 'GPU memory copying' but does not provide specific hardware details such as GPU model, CPU type, or memory specifications used for the experiments. |
| Software Dependencies | Yes | All models are implemented using the dynet library v2.0.2 (Neubig et al., 2017). General-purpose dynet and pytorch implementations available at https://github.com/vene/sparsemap. |
| Experiment Setup | Yes | Parameters are trained using Adam (Kingma & Ba, 2015), tuning the learning rate on the grid {.5, 1, 2, 4, 8} 10 3, expanded by a factor of 2 if the best model is at either end. All models are trained by SGD, with 0.9 learning rate decay at epochs when the validation accuracy is not the best seen. We tune the learning rate on the grid 2k : k { 6, 5, 4, 3} , extending the range if the best model is at either end. |