Deep Biaffine Attention for Neural Dependency Parsing
Authors: Timothy Dozat, Christopher D. Manning
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark outperforming Kiperwasser & Goldberg (2016) by 1.8% and 2.2% and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches. |
| Researcher Affiliation | Academia | Timothy Dozat Stanford University tdozat@stanford.edu Christopher D. Manning Stanford University manning@stanford.edu |
| Pseudocode | No | The paper describes the model and processes using text, equations, and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a link to or explicitly state the release of its source code. |
| Open Datasets | Yes | We show test results for the proposed model on the English Penn Treebank, converted into Stanford Dependencies using both version 3.3.0 and version 3.5.0 of the Stanford Dependency converter (PTB-SD 3.3.0 and PTB-SD 3.5.0); the Chinese Penn Treebank; and the Co NLL 09 shared task dataset, following standard practices for each dataset. |
| Dataset Splits | Yes | Our hyperparameter search was done with the PTB-SD 3.5.0 validation dataset in order to minimize overfitting to the more popular PTB-SD 3.3.0 benchmark, and in our hyperparameter analysis in the following section we report performance on the PTB-SD 3.5.0 test set, shown in Tables 2 and 3. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Tensor Flow" in a footnote but does not provide a specific version number for it or any other software dependencies. |
| Experiment Setup | Yes | Aside from architectural differences between ours and the other graph-based parsers, we make a number of hyperparameter choices that allow us to outperform theirs, laid out in Table 1. Table 1 lists: Embedding size 100, Embedding dropout 33%, LSTM size 400, LSTM dropout 33%, Arc MLP size 500, Arc MLP dropout 33%, Label MLP size 100, Label MLP dropout 33%, LSTM depth 3, MLP depth 1, α 2e-3, β1,β2 .9, Annealing .75, t 5000, tmax 50,000. We optimize the network with annealed Adam (Kingma & Ba, 2014) for about 50,000 steps, rounded up to the nearest epoch. |