On Metric DBSCAN with Low Doubling Dimension

Authors: Hu Ding, Fan Yang, Mingyue Wang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that our algorithms can significantly outperform the existing DBSCAN algorithms in terms of running time. Finally, we compare the experimental performances of our algorithms and several well-known baseline DBSCAN algorithms on both synthetic and real datasets.
Researcher Affiliation Academia 1The School of Computer Science and Technology, University of Science and Technology of China huding@ustc.edu.cn, {yang208,mywang}@mail.ustc.edu.cn
Pseudocode Yes Algorithm 1 The Randomized Gonzalez’s algorithm; Algorithm 2 METRIC DBSCAN ALGORITHM
Open Source Code No The paper states that 'Our algorithms METRIC-1 and METRIC-2 are also implemented in C++', but does not provide any concrete access (link, explicit statement of release) to the source code for these implementations.
Open Datasets Yes NEURIPS [Perrone et al., 2017] contains n = 11463 word vectors of the full texts of the Neur IPS conference papers published in 1987-2015. USPSHW [Hull, 1994] contains n = 7291 16 16 pixel handwritten letter images. MNIST [Le Cun et al., 1998] contains n = 10000 handwritten digit images from 0 to 9, where each image is represented by a 784-dimensional vector.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification Yes All the experimental results were obtained on a Windows 10 workstation equipped with an Intel core i5-8400 processor and 8GB RAM.
Software Dependencies No The paper states that their algorithms are 'implemented in C++', but it does not provide specific version numbers for the C++ compiler or any libraries used.
Experiment Setup Yes We set z = 200 (i.e., 1%n) and vary the ratio r/ in 0-0.5. Further, we set the value Min Pts = 1 1000n and 2 1000n for each dataset and show the running times in Figure 3.