Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distant IE by Bootstrapping Using Lists and Document Structure
Authors: Lidong Bing, Mingyang Ling, Richard Wang, William Cohen
AAAI 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on two corpora, for diseases and drugs, and the results show that this approach significantly improves over a classical distant-supervision approach. |
| Researcher Affiliation | Collaboration | Carnegie Mellon University, Pittsburgh, PA 15213 US Development Center, Baidu USA, Sunnyvale, CA 94089 {lbing@cs, mingyanl@andrew, wcohen@cs}.cmu.edu EMAIL |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. Algorithms are described in prose. |
| Open Source Code | No | The paper links to ProPPR, a tool used in their work ("3https://github.com/Team Cohen/Pro PPR"), but does not provide a link or statement for the open-source code of their specific methodology, DIEBOLDS. |
| Open Datasets | Yes | Our target drug corpus, called Daily Med, is downloaded from dailymed.nlm.nih.gov which contains 28,590 XML documents... Our target disease corpus, called Wiki Disease, is extracted from a Wikipedia dump of May 2015... The structured drug corpus, called Web MD, is collected from www.webmd.com... The structured disease corpus, called Mayo Clinic, is collected from www.mayoclinic.org. |
| Dataset Splits | Yes | These triples are split into development set and validating set in the ratio of 9:1. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments were provided. |
| Software Dependencies | No | The paper mentions various software components and tools such as "GDep parser", "Multi Rank Walk (MRW)", "Pro PPR", and "SVM classifier (Chang and Lin 2001)" (with a link to LIBSVM), but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We adopt an existing multi-class label propagation method, namely, Multi Rank Walk (MRW) (Lin and Cohen 2010)... (In the experiments we use α = 0.1.) |