Global Greedy Dependency Parsing
Authors: Zuchao Li, Hai Zhao, Kevin Parnow8319-8326
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using multiple benchmark treebanks, including the Penn Treebank (PTB), the Co NLL-X treebanks, and the Universal Dependency Treebanks, we evaluate our parser and demonstrate that the proposed novel parser achieves good performance with faster training and decoding. From the evaluation results on the benchmark treebanks, our proposed model gives significant improvements when compared to the baseline parser. |
| Researcher Affiliation | Academia | Zuchao Li,1,2,3 Hai Zhao,1,2,3, Kevin Parnow1,2,3 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {charlee, parnow}@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn |
| Pseudocode | No | The paper describes algorithmic steps in narrative text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Our code is available at https://github.com/bcmi220/ggdp. |
| Open Datasets | Yes | For English, we use the Stanford Dependency (SD 3.3.0) (De Marneffe and Manning 2008) conversion of the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993), and follow the standard splitting convention for PTB, using sections 2-21 for training, section 22 as a development set and section 23 as a test set. ... For the Co NLL Treebanks, we use the English treebank from the Co NLL-2008 shared task (Surdeanu et al. 2008) and all 13 treebanks from the Co NLL-X shared task (Buchholz and Marsi 2006). ... For UD Treebanks, following the selection of (Ma et al. 2018), we take 12 treebanks from UD version 2.1 (Nivre et al. 2017): Bulgarian (bg), Catalan (ca), Czech (cs), Dutch (nl), English (en), French (fr), German (de), Italian (it), Norwegian (no), Romanian (ro), Russian (ru) and Spanish (es). |
| Dataset Splits | Yes | For English, we use the Stanford Dependency (SD 3.3.0) (De Marneffe and Manning 2008) conversion of the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993), and follow the standard splitting convention for PTB, using sections 2-21 for training, section 22 as a development set and section 23 as a test set. We adopt the standard training/dev/test splits and use the universal POS tags provided in each treebank for all the languages. |
| Hardware Specification | Yes | The experimental environment is on the same machine with Intel i9 9900k CPU and NVIDIA 1080Ti GPU. |
| Software Dependencies | No | The paper mentions software components like GloVe, fastText, ELMo, BERT, and Adam optimizer but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The character embeddings are 8dimensional and randomly initialized. In the character CNN, the convolutions have a window size of 3 and consist of 50 filters. We use 3 stacked bidirectional LSTMs with 512-dimensional hidden states each. The outputs of the Bi LSTM employ a 512-dimensional MLP layer for the arc scorer, a 128-dimensional MLP layer for the relation scorer, and a 128-dimensional MLP layer for the parsing order scorer, with all using Re LU as its activation function. Additionally, for parsing the order score, since considering it a classification problem over parse tree layers, we set its range3 to [0, 1, ..., 32]. Training Parameter optimization is performed with the Adam optimizer with β1 = β2 = 0.9. We choose an initial learning rate of η0 = 0.001. The learning rate η is annealed by multiplying a fixed decay rate ρ = 0.75 when parsing performance stops increasing on validation sets. To reduce the effects of an exploding gradient, we use a gradient clipping of 5.0. For the Bi LSTM, we use recurrent dropout with a drop rate of 0.33 between hidden states and 0.33 between layers. Following (Dozat and Manning 2017), we also use embedding dropout with a rate of 0.33 on all word, character, and POS tag embeddings. |