DBSCAN++: Towards fast and scalable density clustering

Authors: Jennifer Jang, Heinrich Jiang

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We show on both simulated datasets and real datasets that DBSCAN++ runs in a fraction of the time compared to DBSCAN, while giving competitive performance and consistently producing more robust clustering scores across hyperparameter settings.
Researcher Affiliation Industry Jennifer Jang 1 Heinrich Jiang 2 1Uber 2Google Research.
Pseudocode Yes Algorithm 1 DBSCAN; Algorithm 2 DBSCAN++; Algorithm 3 Greedy K-center Initialization.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We used Phonemes (Friedman et al., 2001), a dataset of log periodograms of spoken phonemes, and MNIST, a sub-sample of the MNIST handwriting recognition dataset after running a PCA down to 20 dimensions. The rest of the datasets we used are standard UCI or Kaggle datasets used for clustering.
Dataset Splits No The paper lists datasets and mentions tuning parameters on 'p' values via validation, but it does not provide specific train/validation/test dataset splits (e.g., percentages, absolute counts, or explicit splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions, or specific libraries with their versions) that would be needed to replicate the experiment.
Experiment Setup Yes We fixed min Pts = 10 for all procedures throughout experiments. DBSCAN was initiated with hyperparameters ε = 8 and min Pts = 10, and DBSCAN++ with ε = 60, m/n = 0.3, and min Pts = 10.