Open Information Extraction Systems and Downstream Applications

Authors: Mausam

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we first describe a decade of our progress on building Open IE extractors, which results in our latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs. We also identify several strengths of the Open IE paradigm, which enable it to be a useful intermediate structure for end tasks. We survey its use in both human-facing applications and downstream NLP tasks, including event schema induction, sentence similarity, text comprehension, learning word vector embeddings, and more. In evaluation on out-of-domain sentences, OPENIE4 obtains a speed of 52 sentences/sec, which is a little slower than REVERB s 167 sentences/sec, but OPENIE4 has better precision and enormously better yield compared to REVERB (over 4 times AUC, area under precisionyield curve). Moreover, OPENIE4 obtains 1.32 times AUC compared to OLLIE run with the fast Malt Parser [Nivre et al., 2007], while maintaining the same speed.
Researcher Affiliation Academia Mausam Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India mausam@cse.iitd.ac.in
Pseudocode No The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes All our extractors are publicly available and free to use for research purposes. [...] Available at https://github.com/knowitall/openie
Open Datasets Yes We run Open IE on a large news corpora (1.8 million articles) and release a novel Relgrams dataset, which lists pairs of relation phrases that frequently co-occur in news [Balasubramanian et al., 2012].6 We further run graph clustering over Relgrams to induce a set of common event schemas. [...] Available at http://relgrams.cs.washington.edu
Dataset Splits No The paper mentions evaluating on 'out-of-domain sentences' and 'careful evaluation over Mechanical Turk' but does not specify exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper discusses the efficiency and performance of the Open IE systems, mentioning speeds (e.g., '52 sentences/sec') and AUC scores, but it does not specify any particular hardware components (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions 'Clear NLP s SRL system' and 'Malt Parser [Nivre et al., 2007]' as components used. It also mentions 'OREO' and 'Python' implicitly through a GitHub link. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup No The paper describes the evolution and characteristics of different Open IE systems (TEXTRUNNER, REVERB, OLLIE, SRLIE, OPENIE4) but does not provide specific details on hyperparameters, training configurations, or system-level settings used during experiments.