reproducibilityindex.ai

Adapting k-means Algorithms for Outliers

Authors: Christoph Grunau, Václav Rozhoň

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested the following algorithms on the datasets kdd (KDD Cup 1999) subsampled to 10 000 points with 38 dimensions and spam (Spambase) with 4601 points in 58 dimensions (DG17). ... The results for this setup for k {5, 10, . . . , 50} are in Figs. 1 and 2, each value being an average over 10 runs.
Researcher Affiliation	Academia	1ETH Z urich. Correspondence to: Christoph Grunau <cgrunau@ethz.ch>, V aclav Rozhoˇn <rozhonv@ethz.ch>.
Pseudocode	Yes	Algorithm 1 k-means++ seeding; Algorithm 2 k-means++ (over)seeding with penalties; Algorithm 3 One step of Local-search++; Algorithm 4 Local-search++ with outliers; Algorithm 5 Overseeding from Guha et al. (GMM+03); Algorithm 6 k-means\|\| overseeding.
Open Source Code	No	The paper does not provide a direct link to its source code or explicitly state that the code is being released publicly.
Open Datasets	Yes	We tested the following algorithms on the datasets kdd (KDD Cup 1999) subsampled to 10 000 points with 38 dimensions and spam (Spambase) with 4601 points in 58 dimensions (DG17).
Dataset Splits	No	The paper mentions subsampled dataset sizes but does not specify training, validation, or test splits, or a cross-validation setup.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We set the number of outliers z to be 10 percent of the dataset. ... To guess the value of Θ in all except the ﬁrst two algorithms, we tried 10 values from 1 to 1010, exponentially separated. The best solution was then picked and we followed by running 10 Lloyd iterations on it with the number of outliers for these iterations set to z (the same for the second k-means++ algorithm). The results for this setup for k {5, 10, . . . , 50} are in Figs. 1 and 2, each value being an average over 10 runs.