Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Authors: Pavan Kapanipathi, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan, Maria Chang, Kshitij Fadnis, Chulaka Gunasekara, Bassem Makni, Nicholas Mattei, Kartik Talamadupula, Achille Fokoue8074-8081
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments & Results In this section, we describe the experiments that we performed to evaluate our approach; the setup, including datasets, models, and implementations; and the results. Table 1 gives an overview of our results. They demonstrate that KES, and thus external knowledge, has the biggest impact on the Breaking NLI test set. |
| Researcher Affiliation | Collaboration | IBM Research, MIT-IBM Watson AI Lab, University of Illinois at Urbana-Champaign, Tulane University |
| Pseudocode | No | The paper describes methods and algorithms textually and with diagrams (e.g., Figure 2, Figure 3) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | We considered the most popular NLI datasets: SNLI (Bowman et al. 2015), Sci Tail (Khot, Sabharwal, and Clark 2018), and Multi NLI (Williams, Nangia, and Bowman 2018). |
| Dataset Splits | No | The paper refers to 'validation sets' and 'test sets' for standard NLI datasets (SNLI, SciTail, MultiNLI) but does not provide explicit split percentages, absolute sample counts, or direct citations for the exact splits used for training, validation, and testing. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as CPU or GPU models, or detailed specifications for the computing environment used for experiments. |
| Software Dependencies | No | We used the Allen NLP library to implement the models described below (see also Section 2). |
| Experiment Setup | Yes | Training (of the combined models) consisted of 140 epochs with a patience of 20 epochs. Batch size and learning rate over all the experiments remained 64 and 0.0001 to make the models comparable to each other. All GCNs were configured as follows: two edge types (one for edges in Concept Net and one for the self-loops); 300 dimensions for each embedding across all layers; one convolutional layer; one additional linear layer (after the convolution); and Re LU for all activations. The Personalized Page Rank threshold θ for filtering the subgraphs was also tuned as a hyperparameter. We experimented with θ values of 0.2, 0.4, 0.6, and 0.8. |