Open Information Extraction Systems and Downstream Applications
Authors: Mausam
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we first describe a decade of our progress on building Open IE extractors, which results in our latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs. We also identify several strengths of the Open IE paradigm, which enable it to be a useful intermediate structure for end tasks. We survey its use in both human-facing applications and downstream NLP tasks, including event schema induction, sentence similarity, text comprehension, learning word vector embeddings, and more. In evaluation on out-of-domain sentences, OPENIE4 obtains a speed of 52 sentences/sec, which is a little slower than REVERB s 167 sentences/sec, but OPENIE4 has better precision and enormously better yield compared to REVERB (over 4 times AUC, area under precisionyield curve). Moreover, OPENIE4 obtains 1.32 times AUC compared to OLLIE run with the fast Malt Parser [Nivre et al., 2007], while maintaining the same speed. |
| Researcher Affiliation | Academia | Mausam Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India mausam@cse.iitd.ac.in |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | All our extractors are publicly available and free to use for research purposes. [...] Available at https://github.com/knowitall/openie |
| Open Datasets | Yes | We run Open IE on a large news corpora (1.8 million articles) and release a novel Relgrams dataset, which lists pairs of relation phrases that frequently co-occur in news [Balasubramanian et al., 2012].6 We further run graph clustering over Relgrams to induce a set of common event schemas. [...] Available at http://relgrams.cs.washington.edu |
| Dataset Splits | No | The paper mentions evaluating on 'out-of-domain sentences' and 'careful evaluation over Mechanical Turk' but does not specify exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper discusses the efficiency and performance of the Open IE systems, mentioning speeds (e.g., '52 sentences/sec') and AUC scores, but it does not specify any particular hardware components (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Clear NLP s SRL system' and 'Malt Parser [Nivre et al., 2007]' as components used. It also mentions 'OREO' and 'Python' implicitly through a GitHub link. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | No | The paper describes the evolution and characteristics of different Open IE systems (TEXTRUNNER, REVERB, OLLIE, SRLIE, OPENIE4) but does not provide specific details on hyperparameters, training configurations, or system-level settings used during experiments. |