reproducibilityindex.ai

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Authors: Xiaoyan Wang, Pavan Kapanipathi, Ryan Musa, Mo Yu, Kartik Talamadupula, Ibrahim Abdelaziz, Maria Chang, Achille Fokoue, Bassem Makni, Nicholas Mattei, Michael Witbrock7208-7215

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the Sci Tail science questions dataset.
Researcher Affiliation	Collaboration	Department of Computer Science, University of Illinois at Urbana-Champaign Urbana-Champaign, IL, USA xiaoyan5@illinois.edu IBM T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, USA {ramusa, kapanipa, yum, krtalamad, achille, witbrock}@us.ibm.com {ibrahim.abdelaziz1, maria.chang, bassem.makni, n.mattei}@ibm.com
Pseudocode	No	The paper describes the models and processes using textual descriptions and mathematical equations, but does not provide pseudocode or an algorithm block.
Open Source Code	No	The paper states: 'We used the Allen NLP (allennlp.org) library to implement all the models used in the experiments.' It does not provide concrete access to its own specific source code implementation.
Open Datasets	Yes	We use the Sci Tail dataset (Khot, Sabharwal, and Clark 2018), which is a textual entailment dataset derived from publicly released science domain multiple choice question answering datasets (Welbl, Liu, and Gardner 2017; Clark et al. 2016).
Dataset Splits	No	The paper mentions using 'dev set' and 'test set' and provides overall dataset size ('27,026 sentence pairs'), but does not specify the exact percentages or counts for the training, validation (dev), and test splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Allen NLP' and 'Spacy' but does not specify their version numbers or any other software dependencies with explicit versions.
Experiment Setup	Yes	The system is trained by Adagraph with a learning rate of 0.001, and batch size of 40. Both the text and graph based models are trained jointly. For the graph model, we use concepts from the Concepts Only graph, which is generated using the approach detailed in Section 3.2. All words in the text model are initialized by 300D Glove vectors (Glove 840B 300D) (nlp.stanford.edu/projects/ glove), and the concepts that act as the input for the graph model are initialized by 300D Concept Net PPMI vectors (Speer, Chin, and Havasi 2017); these are openly available for Concept Net. We use the pre-trained embeddings without any ﬁne tuning.