reproducibilityindex.ai

A Scalable Deterministic Global Optimization Algorithm for Clustering Problems

Authors: Kaixun Hua, Mingfei Shi, Yankai Cao

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed numerical experiments on both synthetic and real-world datasets and compared our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. The results reveal a strong performance and scalability of our algorithm.
Researcher Affiliation	Academia	1Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
Pseudocode	Yes	Algorithm 1 Branch-and-Bound Clustering Scheme
Open Source Code	Yes	We present an open-source implementation of the proposed algorithm in Julia. ... The complete code ﬁles can be found in https://github. com/kingsley1989/global_kmeans.
Open Datasets	Yes	The Iris dataset (Fisher, 1936) has 150 samples and 4 attributes, while the Seeds dataset (Charytanowicz et al., 2010) contains 210 samples and 7 attributes. ... The Padberg and Rinald s hole drilling dataset (Padberg & Rinaldi, 1991) ... The glass identiﬁcation dataset (Dua & Graff, 2017)
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, as it focuses on unsupervised clustering problems which typically operate on the full dataset to find optimal clusters.
Hardware Specification	Yes	All serial experiments are executed on a 2.3GHz quad-core 10th-generation Intel Core i7 processor with 16G RAM. The parallel experiments are executed on the Niagara Cluster of Compute Canada.
Software Dependencies	Yes	Our algorithm is implemented in Julia 1.5.3. ... compared with the off-the-shelf global optimal solvers and classical local optimal algorithms. The result of k-means clustering algorithm is generated by repeating with 100 trials using random initialization. ... classic BB algorithm implemented in the state-of-the-art global optimizer CPLEX 20.1.0 (Cplex, 2020), and classic k-means clustering algorithm implemented in Julia Package Clustering.
Experiment Setup	Yes	The result of k-means clustering algorithm is generated by repeating with 100 trials using random initialization. The worst, average and best k-means objectives are recorded. Both our algorithm and CPLEX terminate with a time limit of 4 hours. ... For sample grouping, we limited each group s size under min(162/d k, 10 k) when decomposing the lower bounding problem. ... All datasets are generated with 3 Gaussian clusters randomly with seed = 1. ... For each dataset, we solve two clustering problems (k = 3 and k = 4).