A Scalable Deterministic Global Optimization Algorithm for Clustering Problems

Authors: Kaixun Hua, Mingfei Shi, Yankai Cao

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed numerical experiments on both synthetic and real-world datasets and compared our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. The results reveal a strong performance and scalability of our algorithm.
Researcher Affiliation Academia 1Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
Pseudocode Yes Algorithm 1 Branch-and-Bound Clustering Scheme
Open Source Code Yes We present an open-source implementation of the proposed algorithm in Julia. ... The complete code files can be found in https://github. com/kingsley1989/global_kmeans.
Open Datasets Yes The Iris dataset (Fisher, 1936) has 150 samples and 4 attributes, while the Seeds dataset (Charytanowicz et al., 2010) contains 210 samples and 7 attributes. ... The Padberg and Rinald s hole drilling dataset (Padberg & Rinaldi, 1991) ... The glass identification dataset (Dua & Graff, 2017)
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, as it focuses on unsupervised clustering problems which typically operate on the full dataset to find optimal clusters.
Hardware Specification Yes All serial experiments are executed on a 2.3GHz quad-core 10th-generation Intel Core i7 processor with 16G RAM. The parallel experiments are executed on the Niagara Cluster of Compute Canada.
Software Dependencies Yes Our algorithm is implemented in Julia 1.5.3. ... compared with the off-the-shelf global optimal solvers and classical local optimal algorithms. The result of k-means clustering algorithm is generated by repeating with 100 trials using random initialization. ... classic BB algorithm implemented in the state-of-the-art global optimizer CPLEX 20.1.0 (Cplex, 2020), and classic k-means clustering algorithm implemented in Julia Package Clustering.
Experiment Setup Yes The result of k-means clustering algorithm is generated by repeating with 100 trials using random initialization. The worst, average and best k-means objectives are recorded. Both our algorithm and CPLEX terminate with a time limit of 4 hours. ... For sample grouping, we limited each group s size under min(162/d k, 10 k) when decomposing the lower bounding problem. ... All datasets are generated with 3 Gaussian clusters randomly with seed = 1. ... For each dataset, we solve two clustering problems (k = 3 and k = 4).