reproducibilityindex.ai

A Three-Level Optimization Model for Nonlinearly Separable Clustering

Authors: Liang Bai, Jiye Liang3211-3218

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efﬁciency and effectiveness of the proposed algorithm.
Researcher Affiliation	Academia	Liang Bai, Jiye Liang Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, China {bailiang, ljy}@sxu.edu.cn
Pseudocode	Yes	Algorithm 1: The NKM-NSC algorithm
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the proposed methodology is publicly available.
Open Datasets	Yes	The synthetic data sets include Ring (1,500 objects and 3 clusters), Jain (373 objects and 2 clusters), Flame (240 objects and 2 clusters), Agg (788 objects and 7 clusters), T4.8k (7,235 objects and 6 clusters), T7.1k (3,031 objects and 9 clusters), Chain (1,000 objects and 2 clusters) and Atom (800 objects and 2 clusters). The real data sets include Wine (178 objects, 13 features and 3 clusters), Breast Cancer (569 objects, 30 features and 2 clusters), Handwritten Digits (5,620 objects, 63 features and 10 clusters), Landsat Satellite (6,435 objects, 36 features and 7 clusters), MNIST (10,000 objects, 784 features and 10 clusters) and KDD-CUP 99 (1,048,576 objects, 39 features and 2 clusters). Benchmarks. Clustering benchmarks. https://github.com/deric/clustering-benchmark.
Dataset Splits	No	The paper mentions that 'each of them runs 30 times to compute the mean and standard deviation of ARI and NMI on each data set.' However, it does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10) for reproducibility.
Hardware Specification	Yes	The experiments are conducted on an Intel i7-4710MQ personal computer with 16G RAM.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup	Yes	For the NKM-NSC algorithm, we set α = β = γ = 1, the number of linear clusters p = n , the number of ensemble clusterings T = 12 and the maximum number of iterations τ = 10, respectively. For each algorithm, we ﬁrst set the number of clusters k is equal to its true number of classes on each of the given data sets. Furthermore, we use Gaussian kernel function to produce the distance or similarity matrix and test each of these algorithms with different δ values of the kernel parameter, i.e., δ = εX, εX/10, εX/20, εX/30, εX/40, εX/50 where εX is the average distance of data set X, to select the highest ARI and NMI values for comparison.