A Three-Level Optimization Model for Nonlinearly Separable Clustering

Authors: Liang Bai, Jiye Liang3211-3218

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm.
Researcher Affiliation Academia Liang Bai, Jiye Liang Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, China {bailiang, ljy}@sxu.edu.cn
Pseudocode Yes Algorithm 1: The NKM-NSC algorithm
Open Source Code No The paper does not provide any statement or link indicating that the source code for the proposed methodology is publicly available.
Open Datasets Yes The synthetic data sets include Ring (1,500 objects and 3 clusters), Jain (373 objects and 2 clusters), Flame (240 objects and 2 clusters), Agg (788 objects and 7 clusters), T4.8k (7,235 objects and 6 clusters), T7.1k (3,031 objects and 9 clusters), Chain (1,000 objects and 2 clusters) and Atom (800 objects and 2 clusters). The real data sets include Wine (178 objects, 13 features and 3 clusters), Breast Cancer (569 objects, 30 features and 2 clusters), Handwritten Digits (5,620 objects, 63 features and 10 clusters), Landsat Satellite (6,435 objects, 36 features and 7 clusters), MNIST (10,000 objects, 784 features and 10 clusters) and KDD-CUP 99 (1,048,576 objects, 39 features and 2 clusters). Benchmarks. Clustering benchmarks. https://github.com/deric/clustering-benchmark.
Dataset Splits No The paper mentions that 'each of them runs 30 times to compute the mean and standard deviation of ARI and NMI on each data set.' However, it does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10) for reproducibility.
Hardware Specification Yes The experiments are conducted on an Intel i7-4710MQ personal computer with 16G RAM.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes For the NKM-NSC algorithm, we set α = β = γ = 1, the number of linear clusters p = n , the number of ensemble clusterings T = 12 and the maximum number of iterations τ = 10, respectively. For each algorithm, we first set the number of clusters k is equal to its true number of classes on each of the given data sets. Furthermore, we use Gaussian kernel function to produce the distance or similarity matrix and test each of these algorithms with different δ values of the kernel parameter, i.e., δ = εX, εX/10, εX/20, εX/30, εX/40, εX/50 where εX is the average distance of data set X, to select the highest ARI and NMI values for comparison.