Recognizing and Justifying Text Entailment Through Distributional Navigation on Definition Graphs

Authors: Vivian Silva, Siegfried Handschuh, André Freitas

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed approach present results comparable to some well-established entailment algorithms, while also meeting Explainable AI requirements, supplying clear explanations which allow the inference model interpretation. To evaluate the proposed approach, we run experiments using the BPI dataset and a sample of the Guardian Headlines dataset. The precision, recall and F-measure obtained by the graph navigation algorithm, as well as the baselines, are presented in Table 2.
Researcher Affiliation Academia Vivian S. Silva,1 Andr e Freitas,2 Siegfried Handschuh1 1Department of Computer Science and Mathematics, University of Passau, Innstraße 43, 94032, Passau, Germany 2School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, M13 9PL, UK
Pseudocode Yes The search algorithm is listed below: Input: definition graph G source word S target word T threshold η path length l max number of paths m Output: A set of paths from S to T paths = [] stack = [] new path = [S] stack.push(new path)
Open Source Code No The paper provides links to repositories for *components* used (e.g., Word Net Graph: https://github.com/Lambda-3/Wordnet Graph, Indra: https://github.com/Lambda-3/Indra, Graphene: https://github.com/Lambda-3/Graphene), but it does not explicitly state that the source code for *their entire methodology or system* described in the paper is openly available.
Open Datasets Yes To evaluate the proposed approach, we run experiments using the BPI dataset and a sample of the Guardian Headlines dataset. The BPI dataset... http://www.cs.utexas.edu/users/pclark/bpi-test-suite/. The Guardian Headlines dataset is a set of 32,000 entailment pairs automatically extracted from The Guardian newspaper. Its large size is intended for machine learning purposes... The resulting dataset5, herein called Guardian Headlines Sample (GHS), has 400 positive and 400 negative pairs... https://goo.gl/4i Hdb X.
Dataset Splits No The paper states a train/validation/test split for training a semantic role classifier (for graph construction): 'We used the RNN implementation provided by (Mesnil et al. 2015), and split the data into training (68%), validation (17%) and test (15%) sets.' However, it does not provide specific train/validation/test splits for the main entailment task evaluation on the BPI and GHS datasets, nor for the baselines used.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions software and services used (e.g., 'RNN implementation provided by (Mesnil et al. 2015)', 'word2vec (Mikolov et al. 2013)', 'Indra6 (Freitas et al. 2016) service', 'Sentence Simplification service (Niklaus et al. 2016) in the information extraction pipeline Graphene7'), but it does not specify any version numbers for these components or any other key software dependencies.
Experiment Setup No The paper states that 'the algo-rithm parameters threshold, maximum number of paths and maximum path length (depth limit) were obtained empirically in order to optimize the search,' but it does not provide the specific values for these or any other hyperparameters (e.g., learning rate, batch size, optimizer settings) or detailed system-level training configurations for their main entailment system.