X-DMM: Fast and Scalable Model Based Text Clustering

Authors: Linwei Li, Liangchen Guo, Zhenying He, Yinan Jing, X. Sean Wang4197-4204

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of X-DMM on several real world datasets, and the experimental results show that XDMM achieves substantial speed up compared with existing state-of-the-art algorithms without clustering accuracy degradation.
Researcher Affiliation Academia Linwei Li,1 Liangchen Guo,1 Zhenying He,1,2,3 Yinan Jing,1,2,3 X. Sean Wang1,2,3 1School of Computer Science and Technology, Fudan University 2Shanghai Key Lab of Data Science 3Shanghai Institute of Intelligent Electronics & Systems, China
Pseudocode Yes Algorithm 1 The GSDMM algorithm (...) Algorithm 2 The Metropolis-Hastings algorithm (...) Algorithm 3 Parallel training of DMM
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a link to a code repository for X-DMM.
Open Datasets Yes We use four real world datasets, 20ng1, QA2, and ohsumed3, and Reuters4. (...) NYTimes article dataset5
Dataset Splits No The paper mentions using a 'validation' step in the algorithm description but does not provide specific details on how validation sets were created or used for hyperparameter tuning in the experimental setup (e.g., specific percentages or sample counts for train/validation/test splits).
Hardware Specification Yes The experiments are conducted on a PC with Intel CPU i57400 and Nvidia GPU GTX-1080.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes Similar to (Yin and Wang 2014), we set α = 0.1 and β = 0.1 for GSDMM. (...) Similar to (Griffiths and Steyvers 2004), we set α = 50/K and β = 0.1. (...) The cluster numbers are set as K listed in Table 4.