reproducibilityindex.ai

X-DMM: Fast and Scalable Model Based Text Clustering

Authors: Linwei Li, Liangchen Guo, Zhenying He, Yinan Jing, X. Sean Wang4197-4204

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of X-DMM on several real world datasets, and the experimental results show that XDMM achieves substantial speed up compared with existing state-of-the-art algorithms without clustering accuracy degradation.
Researcher Affiliation	Academia	Linwei Li,1 Liangchen Guo,1 Zhenying He,1,2,3 Yinan Jing,1,2,3 X. Sean Wang1,2,3 1School of Computer Science and Technology, Fudan University 2Shanghai Key Lab of Data Science 3Shanghai Institute of Intelligent Electronics & Systems, China
Pseudocode	Yes	Algorithm 1 The GSDMM algorithm (...) Algorithm 2 The Metropolis-Hastings algorithm (...) Algorithm 3 Parallel training of DMM
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository for X-DMM.
Open Datasets	Yes	We use four real world datasets, 20ng1, QA2, and ohsumed3, and Reuters4. (...) NYTimes article dataset5
Dataset Splits	No	The paper mentions using a 'validation' step in the algorithm description but does not provide specific details on how validation sets were created or used for hyperparameter tuning in the experimental setup (e.g., specific percentages or sample counts for train/validation/test splits).
Hardware Specification	Yes	The experiments are conducted on a PC with Intel CPU i57400 and Nvidia GPU GTX-1080.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	Similar to (Yin and Wang 2014), we set α = 0.1 and β = 0.1 for GSDMM. (...) Similar to (Grifﬁths and Steyvers 2004), we set α = 50/K and β = 0.1. (...) The cluster numbers are set as K listed in Table 4.