Simple and Scalable Sparse k-means Clustering via Feature Ranking
Authors: Zhiyue Zhang, Kenneth Lange, Jason Xu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice. |
| Researcher Affiliation | Academia | Zhiyue Zhang1 Kenneth Lange2 Jason Xu1, 1Department of Statistical Science, Duke University 2 Departments of Computational Medicine, Statistics, and Human Genetics, UCLA Correspondence to jason.q.xu@duke.edu |
| Pseudocode | Yes | Algorithm 1 SKFR1 algorithm pseudocode; Algorithm 2 SKFR2 algorithm pseudocode; Algorithm 3 SKFR permutation tuning pseudocode |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include a link to a code repository. |
| Open Datasets | Yes | Benchmark datasets To further validate our proposed algorithm, we compare SKFR1 to widely used peer algorithms on 10 benchmark datasets collected from the Keel, ASU, and UCI machine learning repositories. ... a mice protein expression dataset from a study of murine Down Syndrome [49]. |
| Dataset Splits | No | The paper describes simulation setups and the number of trials and restarts, and uses methods like the gap statistic for parameter tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper mentions 'Julia 1.1 implementation' and reports runtime, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions 'Julia 1.1' and several R packages (sparcl, Gmedian, kpodclustr, wskm) by name, but it does not provide specific version numbers for these R packages, which are key ancillary software components. |
| Experiment Setup | Yes | In all simulations, the number of informative features s is chosen to be 10, and we explore a range of sparsity levels by varying the total number of features p (20, 50, 100, 200, 500, 1000). The SKFR variant and all the competing algorithms are seeded by the k-means++ initialization scheme [18]. We run 30 trials with 20 restarts per trial. We tune SKM s ℓ1 bound parameter over the range [2, 10] by the gap statistic. |