A Practical Algorithm for Distributed Clustering and Outlier Detection

Authors: Jiecao Chen, Erfan Sadeqi Azer, Qin Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.
Researcher Affiliation Academia Jiecao Chen Indiana University Bloomington Bloomington, IN jiecchen@indiana.edu Erfan Sadeqi Azer Indiana University Bloomington Bloomington, IN esadeqia@indiana.edu Qin Zhang Indiana University Bloomington Bloomington, IN qzhangcs@indiana.edu
Pseudocode Yes Algorithm 1: Summary-Outliers(X, k, t)
Open Source Code No The paper does not provide a direct link or explicit statement about the public availability of the source code for the described methodology.
Open Datasets Yes kdd Full. This dataset is from 1999 kddcup competition and contains instances describing connections of sequences of tcp packets.
Dataset Splits No The paper mentions data is 'randomly partitioned among the sites' but does not provide specific percentages or counts for training, validation, or test splits. It does not mention a 'validation' set specifically.
Hardware Specification Yes All experiments are conducted in a Power Edge R730 server equipped with 2 x Intel Xeon E5-2667 v3 3.2GHz. This server has 8-core/16-thread per CPU, 192GB Memeory and 1.6TB SSD.
Software Dependencies No The paper mentions 'C++ with Boost.MPI support' and 'Armadillo Sanderson (2010) as the numerical linear library' but does not specify version numbers for these software dependencies.
Experiment Setup Yes We fix α = 2 and β = 4.5 in the subroutine Algorithm 1. ... k = 3, t = 8752 for kdd Sp and t = 45747 for kdd Full