Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge-Based Textual Inference via Parse-Tree Transformations

Authors: Roy Bar-Haim, Ido Dagan, Jonathan Berant

JAIR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The utility of our approach was illustrated on two tasks: unsupervised relation extraction from a large corpus, and the Recognizing Textual Entailment (RTE) benchmarks. We proved the correctness of the new algorithm and established its efficiency analytically and empirically. We evaluate both the quality of the system s output (in terms of accuracy, precision, and recall) and its computational efficiency (in terms of running time and space, using various application settings. The results are reported in Table 4. The results are summarized in Table 5. Table 6 provides statistics on rule applications... The accuracies obtained in this experiment are shown in Table 7... Table 8 provides a more detailed view of our system s performance. Tables 9 and 10 illustrate the usage and contribution of individual rule bases.
Researcher Affiliation Academia Ido Dagan EMAIL Computer Science Department, Bar-Ilan University Ramat-Gan 52900, Israel Jonathan Berant EMAIL Computer Science Department, Stanford University
Pseudocode Yes The formalism is presented in more detail, including further examples and pseudo-code for its algorithms. ... Algorithm 1: Applying a rule to a tree ... Algorithm 2: Applying an inference rule to a compact forest
Open Source Code No The paper does not contain an explicit statement or link providing access to the source code for the methodology described.
Open Datasets Yes The utility of our approach was illustrated on two tasks: unsupervised relation extraction from a large corpus, and the Recognizing Textual Entailment (RTE) benchmarks. ... To compare explicit and compact inference we randomly sampled 100 pairs from the RTE-3 development set... Table 6 provides statistics on rule applications using all rule bases, over the RTE-3 development set and the RTE-4 dataset14. ... The output of TEASE and DIRT, as well as many other knowledge resources, is available from the RTE knowledge resources page: http://aclweb.org/aclwiki/index.php?title=RTE_Knowledge_Resources
Dataset Splits Yes To compare explicit and compact inference we randomly sampled 100 pairs from the RTE-3 development set... The system was trained on the RTE-3 development set, and was tested on the RTE3 and RTE-4 test sets (no development set was released for RTE-4). For this evaluation we randomly sampled 75 pairs from the RTE-3 test set...
Hardware Specification No The paper mentions running time and efficiency but does not specify any particular hardware components (e.g., CPU, GPU models) used for the experiments.
Software Dependencies No The paper mentions using Minipar for parsing and an SVM classifier, but it does not specify version numbers for these software components or any other libraries/frameworks.
Experiment Setup Yes In the current system we implemented a simple search strategy, in the spirit of (de Salvo Braz et al., 2005): first, we applied three exhaustive iterations of generic rules. ... At each iteration we first find all rule matches, and then apply all matched rules. ... We then perform a single iteration of all other lexical and lexical-syntactic rules, applying them only if their L part was matched in F and their R part was matched in h. ... Following inference, a set of features is extracted from the resulting F and from h and fed into an SVM classifier, which determines entailment. Co-reference substitution was disabled due to the insufficient accuracy of the co-reference resolution tool we used.