Efficiently Mining High Quality Phrases from Texts
Authors: Bing Li, Xiaochun Yang, Bin Wang, Wei Cui
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3 29 faster than the state-of-the-art works. |
| Researcher Affiliation | Academia | Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education School of Computer Science and Engineering and College of Information Science and Engineering Northeastern University, Shenyang 110819, China |
| Pseudocode | Yes | Algorithm 1 describes our seed extension based approach. Algorithm 1: SEBA |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available. |
| Open Datasets | No | The paper mentions several datasets (5Conf, APNews, Titles, Abstracts, Wiki phrases benchmark) and provides general URLs for some data sources (e.g., dblp.uni-trier.de/db/), but it does not provide direct links or specific access information for the exact datasets or subsets used in their experiments, nor does it explicitly state they are publicly available with concrete access. |
| Dataset Splits | No | The paper describes experimental settings and parameters (e.g., significance level α, frequency threshold ft), but it does not specify how the datasets were split into training, validation, or test sets (e.g., percentages or sample counts). |
| Hardware Specification | Yes | The experiments were run on a PC with an Intel Xeon 3.3GHz 6-Cores CPU X5680 and 24GB memory with a 1TB disk, running Ubuntu (Linux) operating system. |
| Software Dependencies | Yes | Our algorithms were implemented using Java SE Development Kit 8. |
| Experiment Setup | Yes | In our experiments, we set significance level α = 0.05 for all data sets. We set average phrase length l = 2.1 and phrase ratio r = 0.2 based on our empirical statistics on data sets. |