reproducibilityindex.ai

K-Means Clustering with Distributed Dimensions

Authors: Hu Ding, Yu Liu, Lingxiao Huang, Jian Li

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.
Researcher Affiliation	Academia	Hu Ding HUDING@MSU.EDU Computer Science and Engineering, Michigan State University, East Lansing, MI, USA Yu Liu LIUYUJYYZ@126.COM Lingxiao Huang HUANGLINGXIAO1990@126.COM Jian Li LAPORDGE@GMAIL.COM Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China
Pseudocode	Yes	Algorithm 1 DISTDIM-K-MEANS; Algorithm 2 GRID
Open Source Code	No	The paper does not provide explicit statements or links for open-source code availability.
Open Datasets	Yes	We ﬁrst choose a real-world data-set from (Bache & Lichman, 2013), Year Prediction MSD which contains 105 points in R90. ...we also implement our algorithm DISTDIM-K-MEANS on another data-set Bag of Words(NYTimes) from (Bache & Lichman, 2013)
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing, nor does it refer to predefined standard splits for these purposes.
Hardware Specification	No	The paper discusses computation in a distributed setting with 'multiple machines' but does not specify any particular hardware components such as CPU models, GPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions using algorithms from specific research papers (Arthur & Vassilvitskii, 2007; Chawla & Gionis, 2013) as centralized subroutines, but it does not specify any software names with version numbers for implementation or analysis.
Experiment Setup	Yes	We randomly divide the data-set into 3 parties with each having 30 attributes (i.e., T = 3), and set k = 5-100. Also, for k-means clustering with outliers we set the number of outliers z = 500.