reproducibilityindex.ai

DBSCAN++: Towards fast and scalable density clustering

Authors: Jennifer Jang, Heinrich Jiang

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We show on both simulated datasets and real datasets that DBSCAN++ runs in a fraction of the time compared to DBSCAN, while giving competitive performance and consistently producing more robust clustering scores across hyperparameter settings.
Researcher Affiliation	Industry	Jennifer Jang 1 Heinrich Jiang 2 1Uber 2Google Research.
Pseudocode	Yes	Algorithm 1 DBSCAN; Algorithm 2 DBSCAN++; Algorithm 3 Greedy K-center Initialization.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We used Phonemes (Friedman et al., 2001), a dataset of log periodograms of spoken phonemes, and MNIST, a sub-sample of the MNIST handwriting recognition dataset after running a PCA down to 20 dimensions. The rest of the datasets we used are standard UCI or Kaggle datasets used for clustering.
Dataset Splits	No	The paper lists datasets and mentions tuning parameters on 'p' values via validation, but it does not provide specific train/validation/test dataset splits (e.g., percentages, absolute counts, or explicit splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions, or specific libraries with their versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	We ﬁxed min Pts = 10 for all procedures throughout experiments. DBSCAN was initiated with hyperparameters ε = 8 and min Pts = 10, and DBSCAN++ with ε = 60, m/n = 0.3, and min Pts = 10.