Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering
Authors: Xiaoyu Qin, Kai Ming Ting, Ye Zhu, Vincent CS Lee4755-4762
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot. |
| Researcher Affiliation | Academia | Xiaoyu Qin Monash University Victoria, Australia 3800 xiaoyu.qin@ieee.org Kai Ming Ting Federation University Victoria, Australia 3842 kaiming.ting@federation.edu.au Ye Zhu Deakin University Victoria, Australia 3125 ye.zhu@ieee.org Vincent CS Lee Monash University Victoria, Australia 3800 vincent.cs.lee@monash.edu |
| Pseudocode | No | N/A |
| Open Source Code | Yes | All algorithms used in our experiments are implemented in Matlab (the source code with demo can be obtained from https://github.com/cswords/anne-dbscan-demo). |
| Open Datasets | Yes | The artificial datasets are from http://cs.uef.fi/sipu/datasets/ (Gionis, Mannila, and Tsaparas 2007; Zahn 1971; Chang and Yeung 2008; Jain and Law 2005) except that the hard distribution dataset is from https://sourceforge.net/p/density-ratio/ (Zhu, Ting, and Carman 2016), 5 high-dimensional data are from http: //featureselection.asu.edu/datasets.php (Li et al. 2016), and the rest of the datasets are from http://archive.ics.uci.edu/ml (Dheeru and Karra Taniskidou 2017). |
| Dataset Splits | No | We compared all clustering results in terms of the best F1 score (Rijsbergen 1979) that is obtained from a search of the algorithm’s parameter. We search each parameter within a reasonable range. |
| Hardware Specification | Yes | The experiments ran on a machine having CPU: i5-8600k 4.30GHz processor, 8GB RAM; and GPU: GTX Titan X with 3072 1075MHz CUDA (Owens et al. 2008) cores & 12GB graphic memory. |
| Software Dependencies | No | All algorithms used in our experiments are implemented in Matlab (the source code with demo can be obtained from https://github.com/cswords/anne-dbscan-demo). We produced the GPU accelerated versions of all implementations. |
| Experiment Setup | Yes | The ranges used for all algorithms/dissimilarities are provided in Table 2. |