Content and Context: Two-Pronged Bootstrapped Learning for Regex-Formatted Entity Extraction
Authors: Stanley Simoes, Deepak P, Munu Sairamesh, Deepak Khemani, Sameep Mehta
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an empirical evaluation over multiple real world document corpora, we illustrate the effectiveness of our approach. We perform our empirical evaluation on a variety of extraction tasks over multiple real-world document corpora as shown in Table 2. |
| Researcher Affiliation | Collaboration | Stanley Simoes Indian Institute of Technology Madras stanley@cse.iitm.ac.in Deepak P Queen s University Belfast deepaksp@acm.org Munu Sairamesh Indian Institute of Technology Madras musram@gmail.com Deepak Khemani Indian Institute of Technology Madras khemani@iitm.ac.in Sameep Mehta IBM Research India sameepmehta@in.ibm.com |
| Pseudocode | Yes | Algorithm 1 MATCH-SET-EXPANSION |
| Open Source Code | Yes | 4Source code available at https://github.com/stanleyts/ Content NContext |
| Open Datasets | Yes | The talk.politics.mideast and misc.forsale corpora are taken from the 20 Newsgroups dataset6, whereas the Enron corpus is a random subset of 100k documents from the Enron Email Dataset7. The Web KB corpus8 is another popular document dataset. 6http://qwone.com/ jason/20Newsgroups/ 7https://www.cs.cmu.edu/ ./enron/ 8http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo51/www/co-training/data/ |
| Dataset Splits | No | The paper references various datasets but does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only vaguely mentions 'server facilities' in the acknowledgments. |
| Software Dependencies | No | The paper describes algorithms and models (e.g., logistic regression, Levenshtein automaton) but does not provide specific software names with version numbers for replication. |
| Experiment Setup | Yes | Our method uses three parameters: d, num, and p. We set these to 4, 150, and 1%, unless otherwise stated. We separately study the performance of our method across variations in these parameters. |