Dependency Grammar Induction with a Neural Variational Transition-Based Parser

Authors: Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller6658-6665

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When evaluating on the English Penn Treebank and on eight languages from the Universal Dependency (UD) Treebank, we find that our model with posterior regularization outperforms the best unsupervised transition-based dependency parser and approaches the performance of graph-based models. We also show how a weak form of supervision can be integrated elegantly into our framework in the form of rule expectations. Finally, we present empirical evidence for the complexity advantage of transition-based models: our model attains a large speed-up compared to a state-of-the-art graph-based model. Code and Supplementary Material are available.1
Researcher Affiliation Academia Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, UK {bowen.li, jianpeng.cheng, yang.liu2}@ed.ac.uk, keller@inf.ed.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code and Supplementary Material are available.1 https://github.com/libowen2121/VI-dependency-syntax
Open Datasets Yes English Penn Treebank We use the Wall Street Journal (WSJ) section of the English Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993). The dataset is preprocessed to strip off punctuation. We train our model on sections 2 21, tune the hyperparameters on section 22, and evaluate on section 23. Sentences of length 10 are used for training, and we report directed dependency accuracy (DDA) on test sentences of length 10 (WSJ-10), and on all sentences (WSJ). Universal Dependency Treebank We select eight languages from the Universal Dependency Treebank 1.4 (Nivre et al. 2016).
Dataset Splits Yes We train our model on sections 2 21, tune the hyperparameters on section 22, and evaluate on section 23.
Hardware Specification No The paper states: "All experiments were conduct on the same CPU platform." This is not a specific hardware detail.
Software Dependencies No The paper mentions using "Ada Grad (Duchi, Hazan, and Singer 2011)" and "projected gradient descent algorithm (Bertsekas 1999)" for optimization, and "Glo Ve embeddings (Pennington, Socher, and Manning 2014)" and "Fast Text embeddings (Bojanowski et al. 2016)" for initialization, and "Brown clustering (Brown et al. 1992)" for features. However, it does not specify version numbers for any of these software components or libraries.
Experiment Setup Yes To avoid a scenario in which REINFORCE has to work with an arbitrarily initialized encoder and decoder, our posterior regularized neural variational dependency parser is pretrained with the direct reward from PR. We use Ada Grad (Duchi, Hazan, and Singer 2011) to optimize the parameters of the encoder and decoder, as well as the projected gradient descent algorithm (Bertsekas 1999) to optimize the parameters of posterior regularization. We use Glo Ve embeddings (Pennington, Socher, and Manning 2014) to initialize English word vectors and Fast Text embeddings (Bojanowski et al. 2016) for the other languages. Across all experiments, we test both unlexicalized and lexicalized versions of our models. The unlexicalized versions use gold POS tags as model inputs, while the lexicalized versions additionally use word tokens (Le and Zuidema 2015). We use Brown clustering (Brown et al. 1992) to obtain additional features in the lexicalized versions (Buys and Blunsom 2015).