Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficiently Mining High Quality Phrases from Texts

Authors: Bing Li, Xiaochun Yang, Bin Wang, Wei Cui

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3 29 faster than the state-of-the-art works.
Researcher Affiliation	Academia	Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education School of Computer Science and Engineering and College of Information Science and Engineering Northeastern University, Shenyang 110819, China
Pseudocode	Yes	Algorithm 1 describes our seed extension based approach. Algorithm 1: SEBA
Open Source Code	No	The paper does not contain any statement about making its source code publicly available.
Open Datasets	No	The paper mentions several datasets (5Conf, APNews, Titles, Abstracts, Wiki phrases benchmark) and provides general URLs for some data sources (e.g., dblp.uni-trier.de/db/), but it does not provide direct links or specific access information for the exact datasets or subsets used in their experiments, nor does it explicitly state they are publicly available with concrete access.
Dataset Splits	No	The paper describes experimental settings and parameters (e.g., significance level α, frequency threshold ft), but it does not specify how the datasets were split into training, validation, or test sets (e.g., percentages or sample counts).
Hardware Specification	Yes	The experiments were run on a PC with an Intel Xeon 3.3GHz 6-Cores CPU X5680 and 24GB memory with a 1TB disk, running Ubuntu (Linux) operating system.
Software Dependencies	Yes	Our algorithms were implemented using Java SE Development Kit 8.
Experiment Setup	Yes	In our experiments, we set significance level α = 0.05 for all data sets. We set average phrase length l = 2.1 and phrase ratio r = 0.2 based on our empirical statistics on data sets.