reproducibilityindex.ai

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

Authors: Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a spectrum of problem settings and machines show that Yinyang K-means excels in all the cases, consistently outperforming the classic K-means by an order of magnitude and the fastest prior known K-means algorithms by more than three times on average. Section 5. Experiments.
Researcher Affiliation	Collaboration	Yufei Ding YDING8@NCSU.EDU Yue Zhao YZHAO30@NCSU.EDU Xipeng Shen XSHEN5@NCSU.EDU Department of Computer Science, North Carolina State University Microsoft Research Madanlal Musuvathi MADANM@MICROSOFT.COM Todd Mytkowicz TODDM@MICROSOFT.COM
Pseudocode	Yes	3. Algorithm. Putting the group ﬁlter, local ﬁlter, and new center update algorithm together, we get the complete Yinyang K-means as follows. Step 1: Set t to a value no greater than k/10 and meeting the space constraint. Group the initial centers into t groups, {Gi\|i = 1, 2, , t} by running K-means on just those initial groups for ﬁve iterations to produce reasonable groups while incurring little overhead.
Open Source Code	Yes	The source code of this work is available at http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2 .
Open Datasets	Yes	We use eight real world large data sets, four of which are taken from the UCI machine learning repository (Bache & Lichman, 2013), while the other four are commonly used image data sets(Wang et al., 2012; 2013).
Dataset Splits	No	The paper mentions using specific datasets for experiments but does not explicitly provide training/test/validation dataset splits, percentages, or cross-validation details needed to reproduce the experiment.
Hardware Specification	Yes	Table 2: 'Time and speedup on an Ivybridge machine (16GB memory, 8-core i7-3770K processor)'; Table 3: 'Overall speedup over standard K-means on a Core2 machine (4GB mem, 4-core Core2 CPU)'
Software Dependencies	No	The paper mentions software like 'Graphlab (Low et al., 2010)', 'Open CV (Open CV)', and 'mlpack (Curtin et al., 2013)', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The parameter t provides a design knob for controlling the space overhead and redundant distance elimination. ... t is set to k/10 if space allows; o.w., the largest possible value is used. ... Group the initial centers into t groups, {Gi\|i = 1, 2, , t} by running K-means on just those initial groups for ﬁve iterations to produce reasonable groups while incurring little overhead.