Incorporating Expert Knowledge into Keyphrase Extraction
Authors: Sujatha Das Gollapalli, Xiao-li Li, Peng Yang
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally illustrate that using within-document features alone, our tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers. |
| Researcher Affiliation | Collaboration | Sujatha Das Gollapalli, Xiao-Li Li Institute for Infocomm Research, A*STAR, Singapore {gollapallis,xlli}@i2r.a-star.edu.sg Peng Yang Tencent AI Lab, Shenzhen, China yangpeng1985521@gmail.com |
| Pseudocode | No | The paper describes its methods and features, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Processed datasets and code are available upon request. |
| Open Datasets | Yes | We evaluate our models using the research paper datasets collected by recent works on keyphrase extraction (Gollapalli and Caragea 2014). To the best of our knowledge, these datasets comprise the largest, publicly-available benchmark datasets of research paper abstracts containing both author-specified keyphrases and citation network information. |
| Dataset Splits | Yes | We employ 10fold cross-validation and present (micro) averaged results for all our experiments using the precision, recall, and F1 measures. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Mallet toolkit (Mc Callum 2002)' and 'The Stanford Parser (Finkel, Grenager, and Manning 2005)' but does not provide specific version numbers for these software components to ensure reproducibility. |
| Experiment Setup | Yes | Default parameter settings were used while training the standard CRF models. For posterior regularization, we set the constraint weights to 50 and the number of iterations for the EM-style optimization algorithm to 100. |