The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering

Authors: Sayak Dey, Swagatam Das, Rammohan Mallipeddi

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.
Researcher Affiliation Collaboration Sayak Dey1 , Swagatam Das2 and Rammohan Mallipeddi3 1Samsung Research Institute, Bangalore, 560037, India 2Indian Statistical Institute, Kolkata, 700108, India 3Kyungpook National University, Daegu, 41566, Republic of Korea
Pseudocode Yes Algorithm 1 Sparse Min Max k-Means Algorithm Input: Data matrix X and number of clusters k Parameter: Tuning parameter s Output: Clusters C1, C2, . . . , Ck 1: Initialize ω as ω1 = . . . = ωp = 1 p. 2: while stopping criteria (17) is not satisfied do 3: Optimize (12) with respect to C1, C2, . . . , CK, keeping w and ω fixed. That is, minimize C1,C2,...,CK j=1 (xij xi j)2 . Maximizing (10) is same as minimizing (9). 4: Optimize (12) with respect to w, keeping C1, C2, . . . , CK abd ω fixed. That is, maximize w1,w2,...,w K j=1 (xij xi j)2 . Minimizing (10) is same as maximizing (9). 5: Optimize (12) with respect to ω, keeping C1, C2, . . . , CK and w1, w2, . . . , w K fixed, which results in the optimization problem stated in (13) and can be solved using the Proposition stated in page 715 of [Witten and Tibshirani, 2010] to get ωnew. 6: end while 7: return the clusters given by C1, C2, . . . , CK, the cluster weights by w1, w2, . . . , wk and the feature weights corresponding to this clustering given by ω1, ω2, . . . , ωp.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the proposed methodology is openly available.
Open Datasets Yes We compare the performance of our approach with other well-known high-dimensional clustering algorithms through extensive experiments on several real word datasets (especially, the high-dimensional gene microarray datasets).', citing '[Jin and Wang, 2014] Jiashun Jin and Wanjie Wang. Gene microarray data sets, 2014.' and '[Bache and Lichman, 2013] K. Bache and M. Lichman. Uci machine learning repository, 2013.
Dataset Splits No The paper mentions the use of several datasets for experiments but does not provide specific information regarding the percentages, sample counts, or methodology for training, validation, and test splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, specific libraries) used in the implementation or experimentation.
Experiment Setup No The paper discusses the selection of tuning parameters for α and s, and mentions a precision level ϵ for the stopping criterion ('ϵ was chosen as 10^4 as suggested in [Witten and Tibshirani, 2010]'), but it does not provide comprehensive experimental setup details such as specific hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other system-level training configurations.