On Integrated Clustering and Outlier Detection

Authors: Lionel Ott, Linsey Pang, Fabio T Ramos, Sanjay Chawla

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation on synthetic and real data sets attest to both the quality and scalability of our proposed method. Experiments on synthetic and real data sets are the focus of Section 5 before concluding with Section 6.
Researcher Affiliation Academia Lionel Ott University of Sydney lott4241@uni.sydney.edu.au Linsey Pang University of Sydney qlinsey@it.usyd.edu.au Fabio Ramos University of Sydney fabio.ramos@sydney.edu.au Sanjay Chawla University of Sydney sanjay.chawla@sydney.edu.au
Pseudocode Yes A high level algorithm description is given in Algorithm 1. Algorithm 1: Lagrangian Relaxation()
Open Source Code No The paper does not contain an explicit statement or link providing access to the source code for the described methodology.
Open Datasets Yes We use synthetic datasets for controlled performance evaluation and comparison between the different methods. Then, we present clustering and outlier results obtained on the MNIST image data set.
Dataset Splits No The paper mentions using synthetic datasets and the MNIST dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper mentions memory usage differences ('APOC requires around 2200 MB of memory while LR only needs 370 MB') which implies execution on hardware, but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions comparing results with CPLEX ('solving the LP relaxation using CPLEX'), but it does not provide specific version numbers for CPLEX or any other software libraries or dependencies used in their implementation.
Experiment Setup Yes Both LR and APOC require a cost for creating clusters. We obtain this value as α median(dij), i.e. the median of all distances multiplied by a scaling factor α which typically is in the range [1, 30]. The initial centroids required by k-means-are found using k-means++ [14]. In Table 1 we show results for LR with values of cost scaling factor α = {5, 15, 25}, APOC with α = 15 and k-means-with k = {10, 40}.