Balanced Clustering: A Uniform Model and Fast Algorithm
Authors: Weibo Lin, Zhu He, Mingyu Xiao
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results over benchmarks validate the advantage of our algorithm compared to the state-of-the-art balanced clustering algorithms. On most datasets, our algorithm runs more than 100 times faster than previous algorithms with a better solution. |
| Researcher Affiliation | Collaboration | Weibo Lin1 , Zhu He1,2 and Mingyu Xiao1 1School of Computer Science, University of Electronic Science and Technology of China 2Zhejiang Cainiao Supply Chain Management Co. Ltd. |
| Pseudocode | Yes | Algorithm 1 Regularized k-means with warm start |
| Open Source Code | Yes | The codes and data in this paper are publicly available1. 1https://github.com/zhu-he/regularized-k-means |
| Open Datasets | Yes | In the experiments, we consider ten datasets, including six real-world UCI datasets2, two artificial datasets s1-s23 which have Gaussian clusters with increasing overlap and two MNIST datasets4 of handwritten digits. 2https://archive.ics.uci.edu/ml/index.php 3http://cs.uef.fi/sipu/datasets/ 4http://yann.lecun.com/exdb/mnist/ |
| Dataset Splits | No | The paper refers to "MNIST-train" and "MNIST-test" datasets, implying training and testing sets. However, it does not provide specific details about how data was split for training, validation, and testing (e.g., percentages, counts, or a specific validation split methodology). |
| Hardware Specification | Yes | As a platform, Intel Core i7-8700K 3.70 GHz processor with 16GB memory was used. |
| Software Dependencies | No | The paper mentions that "LKM was re-implemented in C++ by ourselves. Our method is also implemented in C++." However, it does not provide specific version numbers for any C++ compiler, libraries, or other software dependencies used in the experiments. |
| Experiment Setup | Yes | Regularization settings. To get a contrastive result between our method and LKM, we set the regularization term in our model the same as that in LKM by letting f1(x) = = fk(x) = λ x2, (16) where λ is the balance parameter. In order to obtain different balancing performance, different values of λ are tested. Specifically, let V ar = Pn i=1 ||xi µ||2 2 where µ = Pn i=1 xi n be the variance of the dataset {xi}n i=1, the values of λ are uniformly chosen from the interval [0, 40V ar kn2 ]. For hardbalanced clustering, we use the following functions to derive a strictly balanced result: M x if x < n k x) if x > n k otherwise, (17) for h = 1, . . . , k, where M is a large real number. In the experiments, it is sufficient to set M = V ar. |