A Three-Level Optimization Model for Nonlinearly Separable Clustering
Authors: Liang Bai, Jiye Liang3211-3218
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm. |
| Researcher Affiliation | Academia | Liang Bai, Jiye Liang Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, China {bailiang, ljy}@sxu.edu.cn |
| Pseudocode | Yes | Algorithm 1: The NKM-NSC algorithm |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the proposed methodology is publicly available. |
| Open Datasets | Yes | The synthetic data sets include Ring (1,500 objects and 3 clusters), Jain (373 objects and 2 clusters), Flame (240 objects and 2 clusters), Agg (788 objects and 7 clusters), T4.8k (7,235 objects and 6 clusters), T7.1k (3,031 objects and 9 clusters), Chain (1,000 objects and 2 clusters) and Atom (800 objects and 2 clusters). The real data sets include Wine (178 objects, 13 features and 3 clusters), Breast Cancer (569 objects, 30 features and 2 clusters), Handwritten Digits (5,620 objects, 63 features and 10 clusters), Landsat Satellite (6,435 objects, 36 features and 7 clusters), MNIST (10,000 objects, 784 features and 10 clusters) and KDD-CUP 99 (1,048,576 objects, 39 features and 2 clusters). Benchmarks. Clustering benchmarks. https://github.com/deric/clustering-benchmark. |
| Dataset Splits | No | The paper mentions that 'each of them runs 30 times to compute the mean and standard deviation of ARI and NMI on each data set.' However, it does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10) for reproducibility. |
| Hardware Specification | Yes | The experiments are conducted on an Intel i7-4710MQ personal computer with 16G RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | For the NKM-NSC algorithm, we set α = β = γ = 1, the number of linear clusters p = n , the number of ensemble clusterings T = 12 and the maximum number of iterations τ = 10, respectively. For each algorithm, we first set the number of clusters k is equal to its true number of classes on each of the given data sets. Furthermore, we use Gaussian kernel function to produce the distance or similarity matrix and test each of these algorithms with different δ values of the kernel parameter, i.e., δ = εX, εX/10, εX/20, εX/30, εX/40, εX/50 where εX is the average distance of data set X, to select the highest ARI and NMI values for comparison. |