Deep Biaffine Attention for Neural Dependency Parsing

Authors: Timothy Dozat, Christopher D. Manning

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark outperforming Kiperwasser & Goldberg (2016) by 1.8% and 2.2% and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.
Researcher Affiliation Academia Timothy Dozat Stanford University tdozat@stanford.edu Christopher D. Manning Stanford University manning@stanford.edu
Pseudocode No The paper describes the model and processes using text, equations, and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a link to or explicitly state the release of its source code.
Open Datasets Yes We show test results for the proposed model on the English Penn Treebank, converted into Stanford Dependencies using both version 3.3.0 and version 3.5.0 of the Stanford Dependency converter (PTB-SD 3.3.0 and PTB-SD 3.5.0); the Chinese Penn Treebank; and the Co NLL 09 shared task dataset, following standard practices for each dataset.
Dataset Splits Yes Our hyperparameter search was done with the PTB-SD 3.5.0 validation dataset in order to minimize overfitting to the more popular PTB-SD 3.3.0 benchmark, and in our hyperparameter analysis in the following section we report performance on the PTB-SD 3.5.0 test set, shown in Tables 2 and 3.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies No The paper mentions "Tensor Flow" in a footnote but does not provide a specific version number for it or any other software dependencies.
Experiment Setup Yes Aside from architectural differences between ours and the other graph-based parsers, we make a number of hyperparameter choices that allow us to outperform theirs, laid out in Table 1. Table 1 lists: Embedding size 100, Embedding dropout 33%, LSTM size 400, LSTM dropout 33%, Arc MLP size 500, Arc MLP dropout 33%, Label MLP size 100, Label MLP dropout 33%, LSTM depth 3, MLP depth 1, α 2e-3, β1,β2 .9, Annealing .75, t 5000, tmax 50,000. We optimize the network with annealed Adam (Kingma & Ba, 2014) for about 50,000 steps, rounded up to the nearest epoch.