reproducibilityindex.ai

Fast Algorithms for Distributed k-Clustering with Outliers

Authors: Junyu Huang, Qilong Feng, Ziyun Huang, Jinhui Xu, Jianxin Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments suggest that the proposed 2-round distributed algorithms outperform other state-of-the-art algorithms.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Central South University, Changsha 410083, China 2Xiangjiang Laboratory, Changsha 410205, China 3Department of Computer Science and Software Engineering, Penn State Erie, the Behrend College 4Department of Computer Science and Engineering, State University of New York at Buffalo, NY, USA 5The Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha 410083, China.
Pseudocode	Yes	Algorithm 1 IRS; Algorithm 2 MCA; Algorithm 3 Distributed (k, z)-center using IRS; Algorithm 4 SNS; Algorithm 5 Distributed (k, z)-center using SNS; Algorithm 6 Distributed (k, z)-median/means by IRS; Algorithm 7 Distributed (k, z)-median/means by SNS
Open Source Code	No	The paper mentions a GitHub link in footnote 3 for 'dist kzc3 (Li & Guo, 2018)', which refers to source code from a related work for comparison, not the authors' own implementation described in this paper.
Open Datasets	Yes	In this section, we evaluate the performance of our algorithms on several real-world datasets (1https://archive.ics.uci.edu/ml/index.php 2http://corpus-texmex.irisa.fr) including 3 small datasets (letter: 20,000 16, skin: 245,057 3, covertype: 581,012 54) and 3 large datasets (gas: 928,991 10 and higgs: 11,000,000 27, sift2: 100,000,000 128).
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing.
Hardware Specification	Yes	For hardware, we use a machine with 72 Intel Xeon Gold 6230 CPUs and 1TB memory.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	In our experiments, we ﬁx the parameter η = 0.5 and multiply the sampling rounds by a factor β = 0.01. For each parameter setting, the experiments are repeated for ﬁve times, and we take the average results.