Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Open Information Extraction Systems and Downstream Applications
Authors: Mausam
IJCAI 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we first describe a decade of our progress on building Open IE extractors, which results in our latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs. We also identify several strengths of the Open IE paradigm, which enable it to be a useful intermediate structure for end tasks. We survey its use in both human-facing applications and downstream NLP tasks, including event schema induction, sentence similarity, text comprehension, learning word vector embeddings, and more. In evaluation on out-of-domain sentences, OPENIE4 obtains a speed of 52 sentences/sec, which is a little slower than REVERB s 167 sentences/sec, but OPENIE4 has better precision and enormously better yield compared to REVERB (over 4 times AUC, area under precisionyield curve). Moreover, OPENIE4 obtains 1.32 times AUC compared to OLLIE run with the fast Malt Parser [Nivre et al., 2007], while maintaining the same speed. |
| Researcher Affiliation | Academia | Mausam Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India EMAIL |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | All our extractors are publicly available and free to use for research purposes. [...] Available at https://github.com/knowitall/openie |
| Open Datasets | Yes | We run Open IE on a large news corpora (1.8 million articles) and release a novel Relgrams dataset, which lists pairs of relation phrases that frequently co-occur in news [Balasubramanian et al., 2012].6 We further run graph clustering over Relgrams to induce a set of common event schemas. [...] Available at http://relgrams.cs.washington.edu |
| Dataset Splits | No | The paper mentions evaluating on 'out-of-domain sentences' and 'careful evaluation over Mechanical Turk' but does not specify exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper discusses the efficiency and performance of the Open IE systems, mentioning speeds (e.g., '52 sentences/sec') and AUC scores, but it does not specify any particular hardware components (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Clear NLP s SRL system' and 'Malt Parser [Nivre et al., 2007]' as components used. It also mentions 'OREO' and 'Python' implicitly through a GitHub link. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | No | The paper describes the evolution and characteristics of different Open IE systems (TEXTRUNNER, REVERB, OLLIE, SRLIE, OPENIE4) but does not provide specific details on hyperparameters, training configurations, or system-level settings used during experiments. |