Predicting the Argumenthood of English Prepositional Phrases
Authors: Najoung Kim, Kyle Rawlins, Benjamin Van Durme, Paul Smolensky6578-6585
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose two PP argumenthood prediction tasks branching from these two motivations: (1) binary argumentadjunct classification of PPs in Verb Net, and (2) gradient argumenthood prediction using human judgments as gold standard, and report results from prediction models that use pretrained word embeddings and other linguistically informed features. Our best results on each task are (1) acc. = 0.955, F1 = 0.954 (ELMo+Bi LSTM) and (2) Pearson s r = 0.624 (word2vec+MLP). Furthermore, we demonstrate the utility of argumenthood prediction in improving sentence representations via performance gains on SRL when a sentence encoder is pretrained with our tasks. |
| Researcher Affiliation | Collaboration | Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Microsoft Research AI, Redmond, WA, USA |
| Pseudocode | No | The paper includes equations for the classifier but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions publicly available pretrained embeddings and the use of the Allen NLP toolkit, but does not provide a link to the authors' own implementation code for the methodology described in the paper. The new dataset is stated 'To be released at: decomp.io'. |
| Open Datasets | Yes | We use Verb Net subcategorization frames to define the argument-adjunct status of a verb-PP combination. Prop Bank (Palmer, Gildea, and Kingsbury 2005) addresses this issue... We chose Verb Net over Prop Bank ARG-N and AM labels... We chose NLI datasets as the source of full sentence inputs over other parsed corpora such as the Penn Tree Bank... Stanford Natural Language Inference (SNLI; (Bowman et al. 2015)) and Multi-genre Natural Language Inference (MNLI; (Williams, Nangia, and Bowman 2018)) datasets |
| Dataset Splits | Yes | This balanced dataset (n = 27, 088) is randomly split into 70:15:15 train:dev:test sets. We use 15% of the dataset as development set, and train/test using 10-fold cross-validation on the remaining 85% rather than reporting performance on a fixed test split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the 'Allen NLP toolkit' and 'Python library scikit-learn' but does not specify version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The models are trained using Adadelta (Zeiler 2012) with cross-entropy loss and batch size = 32. Various features in addition to the embeddings of verbs and prepositions were also tested. The features we experimented with include semantic protorole property scores (Reisinger et al. 2015) of the target PP (normalized mean across 5 annotators), mutual information (MI) (Aldezabal et al. 2002), word embeddings of the nominal head token of the NP under the PP in question, existence of a direct object, and various interaction terms between the features (e.g., additive, subtractive, inner/outer products). |