On Integrated Clustering and Outlier Detection
Authors: Lionel Ott, Linsey Pang, Fabio T Ramos, Sanjay Chawla
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation on synthetic and real data sets attest to both the quality and scalability of our proposed method. Experiments on synthetic and real data sets are the focus of Section 5 before concluding with Section 6. |
| Researcher Affiliation | Academia | Lionel Ott University of Sydney lott4241@uni.sydney.edu.au Linsey Pang University of Sydney qlinsey@it.usyd.edu.au Fabio Ramos University of Sydney fabio.ramos@sydney.edu.au Sanjay Chawla University of Sydney sanjay.chawla@sydney.edu.au |
| Pseudocode | Yes | A high level algorithm description is given in Algorithm 1. Algorithm 1: Lagrangian Relaxation() |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the source code for the described methodology. |
| Open Datasets | Yes | We use synthetic datasets for controlled performance evaluation and comparison between the different methods. Then, we present clustering and outlier results obtained on the MNIST image data set. |
| Dataset Splits | No | The paper mentions using synthetic datasets and the MNIST dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper mentions memory usage differences ('APOC requires around 2200 MB of memory while LR only needs 370 MB') which implies execution on hardware, but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions comparing results with CPLEX ('solving the LP relaxation using CPLEX'), but it does not provide specific version numbers for CPLEX or any other software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | Both LR and APOC require a cost for creating clusters. We obtain this value as α median(dij), i.e. the median of all distances multiplied by a scaling factor α which typically is in the range [1, 30]. The initial centroids required by k-means-are found using k-means++ [14]. In Table 1 we show results for LR with values of cost scaling factor α = {5, 15, 25}, APOC with α = 15 and k-means-with k = {10, 40}. |