K-Means Clustering with Distributed Dimensions

Authors: Hu Ding, Yu Liu, Lingxiao Huang, Jian Li

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.
Researcher Affiliation Academia Hu Ding HUDING@MSU.EDU Computer Science and Engineering, Michigan State University, East Lansing, MI, USA Yu Liu LIUYUJYYZ@126.COM Lingxiao Huang HUANGLINGXIAO1990@126.COM Jian Li LAPORDGE@GMAIL.COM Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 DISTDIM-K-MEANS; Algorithm 2 GRID
Open Source Code No The paper does not provide explicit statements or links for open-source code availability.
Open Datasets Yes We first choose a real-world data-set from (Bache & Lichman, 2013), Year Prediction MSD which contains 105 points in R90. ...we also implement our algorithm DISTDIM-K-MEANS on another data-set Bag of Words(NYTimes) from (Bache & Lichman, 2013)
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing, nor does it refer to predefined standard splits for these purposes.
Hardware Specification No The paper discusses computation in a distributed setting with 'multiple machines' but does not specify any particular hardware components such as CPU models, GPU models, or memory specifications used for the experiments.
Software Dependencies No The paper mentions using algorithms from specific research papers (Arthur & Vassilvitskii, 2007; Chawla & Gionis, 2013) as centralized subroutines, but it does not specify any software names with version numbers for implementation or analysis.
Experiment Setup Yes We randomly divide the data-set into 3 parties with each having 30 attributes (i.e., T = 3), and set k = 5-100. Also, for k-means clustering with outliers we set the number of outliers z = 500.