reproducibilityindex.ai

Predicting the Argumenthood of English Prepositional Phrases

Authors: Najoung Kim, Kyle Rawlins, Benjamin Van Durme, Paul Smolensky6578-6585

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose two PP argumenthood prediction tasks branching from these two motivations: (1) binary argumentadjunct classiﬁcation of PPs in Verb Net, and (2) gradient argumenthood prediction using human judgments as gold standard, and report results from prediction models that use pretrained word embeddings and other linguistically informed features. Our best results on each task are (1) acc. = 0.955, F1 = 0.954 (ELMo+Bi LSTM) and (2) Pearson s r = 0.624 (word2vec+MLP). Furthermore, we demonstrate the utility of argumenthood prediction in improving sentence representations via performance gains on SRL when a sentence encoder is pretrained with our tasks.
Researcher Affiliation	Collaboration	Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Microsoft Research AI, Redmond, WA, USA
Pseudocode	No	The paper includes equations for the classiﬁer but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions publicly available pretrained embeddings and the use of the Allen NLP toolkit, but does not provide a link to the authors' own implementation code for the methodology described in the paper. The new dataset is stated 'To be released at: decomp.io'.
Open Datasets	Yes	We use Verb Net subcategorization frames to deﬁne the argument-adjunct status of a verb-PP combination. Prop Bank (Palmer, Gildea, and Kingsbury 2005) addresses this issue... We chose Verb Net over Prop Bank ARG-N and AM labels... We chose NLI datasets as the source of full sentence inputs over other parsed corpora such as the Penn Tree Bank... Stanford Natural Language Inference (SNLI; (Bowman et al. 2015)) and Multi-genre Natural Language Inference (MNLI; (Williams, Nangia, and Bowman 2018)) datasets
Dataset Splits	Yes	This balanced dataset (n = 27, 088) is randomly split into 70:15:15 train:dev:test sets. We use 15% of the dataset as development set, and train/test using 10-fold cross-validation on the remaining 85% rather than reporting performance on a ﬁxed test split.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using the 'Allen NLP toolkit' and 'Python library scikit-learn' but does not specify version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	The models are trained using Adadelta (Zeiler 2012) with cross-entropy loss and batch size = 32. Various features in addition to the embeddings of verbs and prepositions were also tested. The features we experimented with include semantic protorole property scores (Reisinger et al. 2015) of the target PP (normalized mean across 5 annotators), mutual information (MI) (Aldezabal et al. 2002), word embeddings of the nominal head token of the NP under the PP in question, existence of a direct object, and various interaction terms between the features (e.g., additive, subtractive, inner/outer products).