Scalable DBSCAN with Random Projections
Authors: HaoChuan Xu, Ninh Pham
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, s DBSCAN is significantly faster and provides higher accuracy than competitive DBSCAN variants on real-world million-point data sets. |
| Researcher Affiliation | Academia | Hao Chuan Xu School of Computer Science University of Auckland hxu612@aucklanduni.ac.nz Ninh Pham School of Computer Science University of Auckland ninh.pham@auckland.ac.nz |
| Pseudocode | Yes | Algorithm 1 DBSCAN |
| Open Source Code | Yes | Our code is available at https://github.com/Ninh Pham/s Dbscan. |
| Open Datasets | Yes | We conduct experiments on three popular data sets: Mnist (n = 70, 000, d = 784, # clusters = 10), Pamap2 (n = 1, 770, 131, d = 51, # clusters = 18), and Mnist8m (n = 8, 100, 000, d = 784, # clusters = 10). |
| Dataset Splits | No | We conduct experiments on three popular data sets: Mnist (n = 70, 000, d = 784, # clusters = 10), Pamap2 (n = 1, 770, 131, d = 51, # clusters = 18), and Mnist8m (n = 8, 100, 000, d = 784, # clusters = 10). |
| Hardware Specification | Yes | We conducted experiments on Ubuntu 20.04.4 with an AMD Ryzen Threadripper 3970X 2.2GHz 32-core processor (64 threads) with 128GB of DRAM. |
| Software Dependencies | Yes | We implement s DBSCAN and s OPTICS in C++ and compile with g++ -O3 -std=c++17 -fopenmp -march=native. We use the Eigen library 2 for SIMD vectorization on computing the distances. |
| Experiment Setup | Yes | Parameter settings. We consider min Pts = 50 for all experiments. s DBSCAN and s OPTICS use D = 1024, m = min Pts. Randomized kernel embeddings use σ = 2ε, d = 1024. We use k = 5 for Mnist and k = 10 for Pamap2 and Mnist8m. |