reproducibilityindex.ai

The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering

Authors: Sayak Dey, Swagatam Das, Rammohan Mallipeddi

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.
Researcher Affiliation	Collaboration	Sayak Dey1 , Swagatam Das2 and Rammohan Mallipeddi3 1Samsung Research Institute, Bangalore, 560037, India 2Indian Statistical Institute, Kolkata, 700108, India 3Kyungpook National University, Daegu, 41566, Republic of Korea
Pseudocode	Yes	Algorithm 1 Sparse Min Max k-Means Algorithm Input: Data matrix X and number of clusters k Parameter: Tuning parameter s Output: Clusters C1, C2, . . . , Ck 1: Initialize ω as ω1 = . . . = ωp = 1 p. 2: while stopping criteria (17) is not satisﬁed do 3: Optimize (12) with respect to C1, C2, . . . , CK, keeping w and ω ﬁxed. That is, minimize C1,C2,...,CK j=1 (xij xi j)2 . Maximizing (10) is same as minimizing (9). 4: Optimize (12) with respect to w, keeping C1, C2, . . . , CK abd ω ﬁxed. That is, maximize w1,w2,...,w K j=1 (xij xi j)2 . Minimizing (10) is same as maximizing (9). 5: Optimize (12) with respect to ω, keeping C1, C2, . . . , CK and w1, w2, . . . , w K ﬁxed, which results in the optimization problem stated in (13) and can be solved using the Proposition stated in page 715 of [Witten and Tibshirani, 2010] to get ωnew. 6: end while 7: return the clusters given by C1, C2, . . . , CK, the cluster weights by w1, w2, . . . , wk and the feature weights corresponding to this clustering given by ω1, ω2, . . . , ωp.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the proposed methodology is openly available.
Open Datasets	Yes	We compare the performance of our approach with other well-known high-dimensional clustering algorithms through extensive experiments on several real word datasets (especially, the high-dimensional gene microarray datasets).', citing '[Jin and Wang, 2014] Jiashun Jin and Wanjie Wang. Gene microarray data sets, 2014.' and '[Bache and Lichman, 2013] K. Bache and M. Lichman. Uci machine learning repository, 2013.
Dataset Splits	No	The paper mentions the use of several datasets for experiments but does not provide specific information regarding the percentages, sample counts, or methodology for training, validation, and test splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, specific libraries) used in the implementation or experimentation.
Experiment Setup	No	The paper discusses the selection of tuning parameters for α and s, and mentions a precision level ϵ for the stopping criterion ('ϵ was chosen as 10^4 as suggested in [Witten and Tibshirani, 2010]'), but it does not provide comprehensive experimental setup details such as specific hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other system-level training configurations.