reproducibilityindex.ai

Incorporating Expert Knowledge into Keyphrase Extraction

Authors: Sujatha Das Gollapalli, Xiao-li Li, Peng Yang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally illustrate that using within-document features alone, our tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show signiﬁcant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.
Researcher Affiliation	Collaboration	Sujatha Das Gollapalli, Xiao-Li Li Institute for Infocomm Research, A*STAR, Singapore {gollapallis,xlli}@i2r.a-star.edu.sg Peng Yang Tencent AI Lab, Shenzhen, China yangpeng1985521@gmail.com
Pseudocode	No	The paper describes its methods and features, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Processed datasets and code are available upon request.
Open Datasets	Yes	We evaluate our models using the research paper datasets collected by recent works on keyphrase extraction (Gollapalli and Caragea 2014). To the best of our knowledge, these datasets comprise the largest, publicly-available benchmark datasets of research paper abstracts containing both author-speciﬁed keyphrases and citation network information.
Dataset Splits	Yes	We employ 10fold cross-validation and present (micro) averaged results for all our experiments using the precision, recall, and F1 measures.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper mentions 'Mallet toolkit (Mc Callum 2002)' and 'The Stanford Parser (Finkel, Grenager, and Manning 2005)' but does not provide specific version numbers for these software components to ensure reproducibility.
Experiment Setup	Yes	Default parameter settings were used while training the standard CRF models. For posterior regularization, we set the constraint weights to 50 and the number of iterations for the EM-style optimization algorithm to 100.