reproducibilityindex.ai

Simple and Scalable Sparse k-means Clustering via Feature Ranking

Authors: Zhiyue Zhang, Kenneth Lange, Jason Xu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.
Researcher Affiliation	Academia	Zhiyue Zhang1 Kenneth Lange2 Jason Xu1, 1Department of Statistical Science, Duke University 2 Departments of Computational Medicine, Statistics, and Human Genetics, UCLA Correspondence to jason.q.xu@duke.edu
Pseudocode	Yes	Algorithm 1 SKFR1 algorithm pseudocode; Algorithm 2 SKFR2 algorithm pseudocode; Algorithm 3 SKFR permutation tuning pseudocode
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include a link to a code repository.
Open Datasets	Yes	Benchmark datasets To further validate our proposed algorithm, we compare SKFR1 to widely used peer algorithms on 10 benchmark datasets collected from the Keel, ASU, and UCI machine learning repositories. ... a mice protein expression dataset from a study of murine Down Syndrome [49].
Dataset Splits	No	The paper describes simulation setups and the number of trials and restarts, and uses methods like the gap statistic for parameter tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper mentions 'Julia 1.1 implementation' and reports runtime, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions 'Julia 1.1' and several R packages (sparcl, Gmedian, kpodclustr, wskm) by name, but it does not provide specific version numbers for these R packages, which are key ancillary software components.
Experiment Setup	Yes	In all simulations, the number of informative features s is chosen to be 10, and we explore a range of sparsity levels by varying the total number of features p (20, 50, 100, 200, 500, 1000). The SKFR variant and all the competing algorithms are seeded by the k-means++ initialization scheme [18]. We run 30 trials with 20 restarts per trial. We tune SKM s ℓ1 bound parameter over the range [2, 10] by the gap statistic.