K-Means Clustering with Distributed Dimensions
Authors: Hu Ding, Yu Liu, Lingxiao Huang, Jian Li
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting. |
| Researcher Affiliation | Academia | Hu Ding HUDING@MSU.EDU Computer Science and Engineering, Michigan State University, East Lansing, MI, USA Yu Liu LIUYUJYYZ@126.COM Lingxiao Huang HUANGLINGXIAO1990@126.COM Jian Li LAPORDGE@GMAIL.COM Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1 DISTDIM-K-MEANS; Algorithm 2 GRID |
| Open Source Code | No | The paper does not provide explicit statements or links for open-source code availability. |
| Open Datasets | Yes | We first choose a real-world data-set from (Bache & Lichman, 2013), Year Prediction MSD which contains 105 points in R90. ...we also implement our algorithm DISTDIM-K-MEANS on another data-set Bag of Words(NYTimes) from (Bache & Lichman, 2013) |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing, nor does it refer to predefined standard splits for these purposes. |
| Hardware Specification | No | The paper discusses computation in a distributed setting with 'multiple machines' but does not specify any particular hardware components such as CPU models, GPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using algorithms from specific research papers (Arthur & Vassilvitskii, 2007; Chawla & Gionis, 2013) as centralized subroutines, but it does not specify any software names with version numbers for implementation or analysis. |
| Experiment Setup | Yes | We randomly divide the data-set into 3 parties with each having 30 attributes (i.e., T = 3), and set k = 5-100. Also, for k-means clustering with outliers we set the number of outliers z = 500. |