The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering
Authors: Sayak Dey, Swagatam Das, Rammohan Mallipeddi
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets. |
| Researcher Affiliation | Collaboration | Sayak Dey1 , Swagatam Das2 and Rammohan Mallipeddi3 1Samsung Research Institute, Bangalore, 560037, India 2Indian Statistical Institute, Kolkata, 700108, India 3Kyungpook National University, Daegu, 41566, Republic of Korea |
| Pseudocode | Yes | Algorithm 1 Sparse Min Max k-Means Algorithm Input: Data matrix X and number of clusters k Parameter: Tuning parameter s Output: Clusters C1, C2, . . . , Ck 1: Initialize ω as ω1 = . . . = ωp = 1 p. 2: while stopping criteria (17) is not satisfied do 3: Optimize (12) with respect to C1, C2, . . . , CK, keeping w and ω fixed. That is, minimize C1,C2,...,CK j=1 (xij xi j)2 . Maximizing (10) is same as minimizing (9). 4: Optimize (12) with respect to w, keeping C1, C2, . . . , CK abd ω fixed. That is, maximize w1,w2,...,w K j=1 (xij xi j)2 . Minimizing (10) is same as maximizing (9). 5: Optimize (12) with respect to ω, keeping C1, C2, . . . , CK and w1, w2, . . . , w K fixed, which results in the optimization problem stated in (13) and can be solved using the Proposition stated in page 715 of [Witten and Tibshirani, 2010] to get ωnew. 6: end while 7: return the clusters given by C1, C2, . . . , CK, the cluster weights by w1, w2, . . . , wk and the feature weights corresponding to this clustering given by ω1, ω2, . . . , ωp. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the proposed methodology is openly available. |
| Open Datasets | Yes | We compare the performance of our approach with other well-known high-dimensional clustering algorithms through extensive experiments on several real word datasets (especially, the high-dimensional gene microarray datasets).', citing '[Jin and Wang, 2014] Jiashun Jin and Wanjie Wang. Gene microarray data sets, 2014.' and '[Bache and Lichman, 2013] K. Bache and M. Lichman. Uci machine learning repository, 2013. |
| Dataset Splits | No | The paper mentions the use of several datasets for experiments but does not provide specific information regarding the percentages, sample counts, or methodology for training, validation, and test splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, specific libraries) used in the implementation or experimentation. |
| Experiment Setup | No | The paper discusses the selection of tuning parameters for α and s, and mentions a precision level ϵ for the stopping criterion ('ϵ was chosen as 10^4 as suggested in [Witten and Tibshirani, 2010]'), but it does not provide comprehensive experimental setup details such as specific hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other system-level training configurations. |