Analyzing Compositionality-Sensitivity of NLI Models
Authors: Yixin Nie, Yicheng Wang, Mohit Bansal6867-6874
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models compositional understanding. |
| Researcher Affiliation | Academia | Yixin Nie,* Yicheng Wang,* Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {yixin1, yicheng, mbansal}@cs.unc.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The code will be released at https://github.com/easonnie/ analyze-compositionality-sensitivity-NLI. |
| Open Datasets | Yes | Large annotated datasets such as the Stanford Natural Language Inference Bowman et al. (2015) (SNLI) and the Multi-Genre Natural Language Inference Williams, Nangia, and Bowman (2018) (MNLI) have promoted the development of many different neural NLI models... |
| Dataset Splits | Yes | The results of these models and their corresponding variants on SNLI, MNLI matched, and MNLI mismatched development set are shown in Table 3, where we see that their performance is not too far from that of their original, recurrent counterparts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | Yes | To create the adversarial data, we used the Stanford Parser Chen and Manning (2014) from Core NLP 3.8.0 to get the dependency parse of the sentences, on which we apply our strategires. |
| Experiment Setup | Yes | We add 20,000 adversarial examples into training at each epoch. |