Probing Natural Language Inference Models through Semantic Fragments
Authors: Kyle Richardson, Hai Hu, Lawrence Moss, Ashish Sabharwal8713-8721
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task; (b) On the other hand, with only a few minutes of additional finetuning with a carefully selected learning rate and a novel variation of inoculation a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks. |
| Researcher Affiliation | Collaboration | Allen Institute for AI, Seattle, WA, USA Indiana University, Bloomington, IN, USA {kyler, ashishs}@allenai.org, {huhai, lmoss}@indiana.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. Figure 3 shows rule templates and labeled examples, which are descriptive rather than executable pseudocode. |
| Open Source Code | No | The paper mentions using a third-party library: 'We use the BERT-base uncased model in all experiments, as implemented in Hugging Face: https://github.com/huggingface/pytorch-pretrained-BERT.' However, it does not provide access to the authors' own source code for the methodology described in the paper. |
| Open Datasets | Yes | Progress in empirical NLI has accelerated due to the introduction of new large-scale NLI datasets, such as the Stanford Natural Language Inference (SNLI) dataset (Bowman et al. 2015) and Multi NLI (MNLI) (Williams, Nangia, and Bowman 2018) |
| Dataset Splits | Yes | For each fragment, we uniformly generated 3,000 training examples and reserved 1,000 examples for testing. ... We also reserve 1,000 for development. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Hugging Face: https://github.com/huggingface/pytorch-pretrained-BERT' for BERT implementation, implying PyTorch, but does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch version, Python version, Hugging Face Transformers version). |
| Experiment Setup | No | The paper mentions hyperparameter searches ('We found all models to be sensitive to learning rate, and performed comprehensive hyper-parameters searches to consider different learning rates, # iterations and (for BERT) random seeds') but does not provide the specific hyperparameter values (e.g., exact learning rates, batch sizes, number of epochs) or other system-level training settings used in the experiments. |