reproducibilityindex.ai

Analyzing Compositionality-Sensitivity of NLI Models

Authors: Yixin Nie, Yicheng Wang, Mohit Bansal6867-6874

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models compositional understanding.
Researcher Affiliation	Academia	Yixin Nie,* Yicheng Wang,* Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {yixin1, yicheng, mbansal}@cs.unc.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The code will be released at https://github.com/easonnie/ analyze-compositionality-sensitivity-NLI.
Open Datasets	Yes	Large annotated datasets such as the Stanford Natural Language Inference Bowman et al. (2015) (SNLI) and the Multi-Genre Natural Language Inference Williams, Nangia, and Bowman (2018) (MNLI) have promoted the development of many different neural NLI models...
Dataset Splits	Yes	The results of these models and their corresponding variants on SNLI, MNLI matched, and MNLI mismatched development set are shown in Table 3, where we see that their performance is not too far from that of their original, recurrent counterparts.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	Yes	To create the adversarial data, we used the Stanford Parser Chen and Manning (2014) from Core NLP 3.8.0 to get the dependency parse of the sentences, on which we apply our strategires.
Experiment Setup	Yes	We add 20,000 adversarial examples into training at each epoch.