DBSCAN++: Towards fast and scalable density clustering
Authors: Jennifer Jang, Heinrich Jiang
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We show on both simulated datasets and real datasets that DBSCAN++ runs in a fraction of the time compared to DBSCAN, while giving competitive performance and consistently producing more robust clustering scores across hyperparameter settings. |
| Researcher Affiliation | Industry | Jennifer Jang 1 Heinrich Jiang 2 1Uber 2Google Research. |
| Pseudocode | Yes | Algorithm 1 DBSCAN; Algorithm 2 DBSCAN++; Algorithm 3 Greedy K-center Initialization. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We used Phonemes (Friedman et al., 2001), a dataset of log periodograms of spoken phonemes, and MNIST, a sub-sample of the MNIST handwriting recognition dataset after running a PCA down to 20 dimensions. The rest of the datasets we used are standard UCI or Kaggle datasets used for clustering. |
| Dataset Splits | No | The paper lists datasets and mentions tuning parameters on 'p' values via validation, but it does not provide specific train/validation/test dataset splits (e.g., percentages, absolute counts, or explicit splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions, or specific libraries with their versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | We fixed min Pts = 10 for all procedures throughout experiments. DBSCAN was initiated with hyperparameters ε = 8 and min Pts = 10, and DBSCAN++ with ε = 60, m/n = 0.3, and min Pts = 10. |