Dependency Grammar Induction with a Neural Variational Transition-Based Parser
Authors: Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller6658-6665
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When evaluating on the English Penn Treebank and on eight languages from the Universal Dependency (UD) Treebank, we find that our model with posterior regularization outperforms the best unsupervised transition-based dependency parser and approaches the performance of graph-based models. We also show how a weak form of supervision can be integrated elegantly into our framework in the form of rule expectations. Finally, we present empirical evidence for the complexity advantage of transition-based models: our model attains a large speed-up compared to a state-of-the-art graph-based model. Code and Supplementary Material are available.1 |
| Researcher Affiliation | Academia | Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, UK {bowen.li, jianpeng.cheng, yang.liu2}@ed.ac.uk, keller@inf.ed.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and Supplementary Material are available.1 https://github.com/libowen2121/VI-dependency-syntax |
| Open Datasets | Yes | English Penn Treebank We use the Wall Street Journal (WSJ) section of the English Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993). The dataset is preprocessed to strip off punctuation. We train our model on sections 2 21, tune the hyperparameters on section 22, and evaluate on section 23. Sentences of length 10 are used for training, and we report directed dependency accuracy (DDA) on test sentences of length 10 (WSJ-10), and on all sentences (WSJ). Universal Dependency Treebank We select eight languages from the Universal Dependency Treebank 1.4 (Nivre et al. 2016). |
| Dataset Splits | Yes | We train our model on sections 2 21, tune the hyperparameters on section 22, and evaluate on section 23. |
| Hardware Specification | No | The paper states: "All experiments were conduct on the same CPU platform." This is not a specific hardware detail. |
| Software Dependencies | No | The paper mentions using "Ada Grad (Duchi, Hazan, and Singer 2011)" and "projected gradient descent algorithm (Bertsekas 1999)" for optimization, and "Glo Ve embeddings (Pennington, Socher, and Manning 2014)" and "Fast Text embeddings (Bojanowski et al. 2016)" for initialization, and "Brown clustering (Brown et al. 1992)" for features. However, it does not specify version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | To avoid a scenario in which REINFORCE has to work with an arbitrarily initialized encoder and decoder, our posterior regularized neural variational dependency parser is pretrained with the direct reward from PR. We use Ada Grad (Duchi, Hazan, and Singer 2011) to optimize the parameters of the encoder and decoder, as well as the projected gradient descent algorithm (Bertsekas 1999) to optimize the parameters of posterior regularization. We use Glo Ve embeddings (Pennington, Socher, and Manning 2014) to initialize English word vectors and Fast Text embeddings (Bojanowski et al. 2016) for the other languages. Across all experiments, we test both unlexicalized and lexicalized versions of our models. The unlexicalized versions use gold POS tags as model inputs, while the lexicalized versions additionally use word tokens (Le and Zuidema 2015). We use Brown clustering (Brown et al. 1992) to obtain additional features in the lexicalized versions (Buys and Blunsom 2015). |