Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
Authors: Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a spectrum of problem settings and machines show that Yinyang K-means excels in all the cases, consistently outperforming the classic K-means by an order of magnitude and the fastest prior known K-means algorithms by more than three times on average. Section 5. Experiments. |
| Researcher Affiliation | Collaboration | Yufei Ding YDING8@NCSU.EDU Yue Zhao YZHAO30@NCSU.EDU Xipeng Shen XSHEN5@NCSU.EDU Department of Computer Science, North Carolina State University Microsoft Research Madanlal Musuvathi MADANM@MICROSOFT.COM Todd Mytkowicz TODDM@MICROSOFT.COM |
| Pseudocode | Yes | 3. Algorithm. Putting the group filter, local filter, and new center update algorithm together, we get the complete Yinyang K-means as follows. Step 1: Set t to a value no greater than k/10 and meeting the space constraint. Group the initial centers into t groups, {Gi|i = 1, 2, , t} by running K-means on just those initial groups for five iterations to produce reasonable groups while incurring little overhead. |
| Open Source Code | Yes | The source code of this work is available at http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2 . |
| Open Datasets | Yes | We use eight real world large data sets, four of which are taken from the UCI machine learning repository (Bache & Lichman, 2013), while the other four are commonly used image data sets(Wang et al., 2012; 2013). |
| Dataset Splits | No | The paper mentions using specific datasets for experiments but does not explicitly provide training/test/validation dataset splits, percentages, or cross-validation details needed to reproduce the experiment. |
| Hardware Specification | Yes | Table 2: 'Time and speedup on an Ivybridge machine (16GB memory, 8-core i7-3770K processor)'; Table 3: 'Overall speedup over standard K-means on a Core2 machine (4GB mem, 4-core Core2 CPU)' |
| Software Dependencies | No | The paper mentions software like 'Graphlab (Low et al., 2010)', 'Open CV (Open CV)', and 'mlpack (Curtin et al., 2013)', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The parameter t provides a design knob for controlling the space overhead and redundant distance elimination. ... t is set to k/10 if space allows; o.w., the largest possible value is used. ... Group the initial centers into t groups, {Gi|i = 1, 2, , t} by running K-means on just those initial groups for five iterations to produce reasonable groups while incurring little overhead. |