Self-attentive Biaffine Dependency Parsing
Authors: Ying Li, Zhenghua Li, Min Zhang, Rui Wang, Sheng Li, Luo Si
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on both the English and Chinese datasets demonstrate that the two encoders achieve similar overall performance, but further detailed analysis reveals a lot of local divergence with regard to dependency distances. |
| Researcher Affiliation | Collaboration | 1Institute of Artificial Intelligence, School of Computer Science and Technology, Soochow University 2Alibaba Group, China |
| Pseudocode | No | The paper includes figures illustrating models but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific repository link or explicit statement about the release of source code for the methodology described in this paper. Links to third-party tools (ELMo, BERT) are provided but not for the authors' own implementation. |
| Open Datasets | Yes | We conduct experiments on the English Penn Treebank (PTB) dataset with Stanford dependencies and the Chinese dataset of the 2009 Co NLL shared task. For PTB, we directly use the data of Chen and Manning [2014], and they use the Stanford POS tagger. For Chinese Co NLL-2009, we use the data split and POS tags provided by the organizers [Hajic et al., 2009]. |
| Dataset Splits | Yes | Each parser is trained for at most 1, 000 iterations, and the performance is evaluated on the dev data after each iteration for model selection. We stop the training if the peak performance does not increase in 100 consecutive iterations. ... First, we separately train 6 parsers in the n-fold jack-knifing way, where each parser is trained on a 5/6 subset of the whole training data. |
| Hardware Specification | Yes | It takes about 7 days using 6 GPU nodes (GTX 1080Ti). |
| Software Dependencies | No | The paper mentions using specific software tools like ELMo and BERT, but does not provide specific version numbers for these or other key software components used for reproducibility (e.g., 'Allen NLP' without a version, 'BERT-Base models' without a version). |
| Experiment Setup | Yes | The dimensions of the word/tag/position embeddings are 100/100/200; dmodel is 200; dff is 800; head number m is 8, and thus dhead is 25; dropout ratio before residual connection is 0.2; dropout ratios before entering multi-head attention and feed-forward are both 0.1; all other dropout ratios are 0.33; β1 = 0.9, β2 = 0.98 and ϵ = 10 6 for the Adam optimizer. |