Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Authors: Xiaoyan Wang, Pavan Kapanipathi, Ryan Musa, Mo Yu, Kartik Talamadupula, Ibrahim Abdelaziz, Maria Chang, Achille Fokoue, Bassem Makni, Nicholas Mattei, Michael Witbrock7208-7215

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the Sci Tail science questions dataset.
Researcher Affiliation Collaboration Department of Computer Science, University of Illinois at Urbana-Champaign Urbana-Champaign, IL, USA xiaoyan5@illinois.edu IBM T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, USA {ramusa, kapanipa, yum, krtalamad, achille, witbrock}@us.ibm.com {ibrahim.abdelaziz1, maria.chang, bassem.makni, n.mattei}@ibm.com
Pseudocode No The paper describes the models and processes using textual descriptions and mathematical equations, but does not provide pseudocode or an algorithm block.
Open Source Code No The paper states: 'We used the Allen NLP (allennlp.org) library to implement all the models used in the experiments.' It does not provide concrete access to its own specific source code implementation.
Open Datasets Yes We use the Sci Tail dataset (Khot, Sabharwal, and Clark 2018), which is a textual entailment dataset derived from publicly released science domain multiple choice question answering datasets (Welbl, Liu, and Gardner 2017; Clark et al. 2016).
Dataset Splits No The paper mentions using 'dev set' and 'test set' and provides overall dataset size ('27,026 sentence pairs'), but does not specify the exact percentages or counts for the training, validation (dev), and test splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'Allen NLP' and 'Spacy' but does not specify their version numbers or any other software dependencies with explicit versions.
Experiment Setup Yes The system is trained by Adagraph with a learning rate of 0.001, and batch size of 40. Both the text and graph based models are trained jointly. For the graph model, we use concepts from the Concepts Only graph, which is generated using the approach detailed in Section 3.2. All words in the text model are initialized by 300D Glove vectors (Glove 840B 300D) (nlp.stanford.edu/projects/ glove), and the concepts that act as the input for the graph model are initialized by 300D Concept Net PPMI vectors (Speer, Chin, and Havasi 2017); these are openly available for Concept Net. We use the pre-trained embeddings without any fine tuning.