Fast Algorithms for Distributed k-Clustering with Outliers
Authors: Junyu Huang, Qilong Feng, Ziyun Huang, Jinhui Xu, Jianxin Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments suggest that the proposed 2-round distributed algorithms outperform other state-of-the-art algorithms. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Central South University, Changsha 410083, China 2Xiangjiang Laboratory, Changsha 410205, China 3Department of Computer Science and Software Engineering, Penn State Erie, the Behrend College 4Department of Computer Science and Engineering, State University of New York at Buffalo, NY, USA 5The Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha 410083, China. |
| Pseudocode | Yes | Algorithm 1 IRS; Algorithm 2 MCA; Algorithm 3 Distributed (k, z)-center using IRS; Algorithm 4 SNS; Algorithm 5 Distributed (k, z)-center using SNS; Algorithm 6 Distributed (k, z)-median/means by IRS; Algorithm 7 Distributed (k, z)-median/means by SNS |
| Open Source Code | No | The paper mentions a GitHub link in footnote 3 for 'dist kzc3 (Li & Guo, 2018)', which refers to source code from a related work for comparison, not the authors' own implementation described in this paper. |
| Open Datasets | Yes | In this section, we evaluate the performance of our algorithms on several real-world datasets (1https://archive.ics.uci.edu/ml/index.php 2http://corpus-texmex.irisa.fr) including 3 small datasets (letter: 20,000 16, skin: 245,057 3, covertype: 581,012 54) and 3 large datasets (gas: 928,991 10 and higgs: 11,000,000 27, sift2: 100,000,000 128). |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. |
| Hardware Specification | Yes | For hardware, we use a machine with 72 Intel Xeon Gold 6230 CPUs and 1TB memory. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | Yes | In our experiments, we fix the parameter η = 0.5 and multiply the sampling rounds by a factor β = 0.01. For each parameter setting, the experiments are repeated for five times, and we take the average results. |