Analyzing Compositionality-Sensitivity of NLI Models

Authors: Yixin Nie, Yicheng Wang, Mohit Bansal6867-6874

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models compositional understanding.
Researcher Affiliation Academia Yixin Nie,* Yicheng Wang,* Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {yixin1, yicheng, mbansal}@cs.unc.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The code will be released at https://github.com/easonnie/ analyze-compositionality-sensitivity-NLI.
Open Datasets Yes Large annotated datasets such as the Stanford Natural Language Inference Bowman et al. (2015) (SNLI) and the Multi-Genre Natural Language Inference Williams, Nangia, and Bowman (2018) (MNLI) have promoted the development of many different neural NLI models...
Dataset Splits Yes The results of these models and their corresponding variants on SNLI, MNLI matched, and MNLI mismatched development set are shown in Table 3, where we see that their performance is not too far from that of their original, recurrent counterparts.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies Yes To create the adversarial data, we used the Stanford Parser Chen and Manning (2014) from Core NLP 3.8.0 to get the dependency parse of the sentences, on which we apply our strategires.
Experiment Setup Yes We add 20,000 adversarial examples into training at each epoch.