Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On Integrated Clustering and Outlier Detection
Authors: Lionel Ott, Linsey Pang, Fabio T Ramos, Sanjay Chawla
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation on synthetic and real data sets attest to both the quality and scalability of our proposed method. Experiments on synthetic and real data sets are the focus of Section 5 before concluding with Section 6. |
| Researcher Affiliation | Academia | Lionel Ott University of Sydney EMAIL Linsey Pang University of Sydney EMAIL Fabio Ramos University of Sydney EMAIL Sanjay Chawla University of Sydney EMAIL |
| Pseudocode | Yes | A high level algorithm description is given in Algorithm 1. Algorithm 1: Lagrangian Relaxation() |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the source code for the described methodology. |
| Open Datasets | Yes | We use synthetic datasets for controlled performance evaluation and comparison between the different methods. Then, we present clustering and outlier results obtained on the MNIST image data set. |
| Dataset Splits | No | The paper mentions using synthetic datasets and the MNIST dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper mentions memory usage differences ('APOC requires around 2200 MB of memory while LR only needs 370 MB') which implies execution on hardware, but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions comparing results with CPLEX ('solving the LP relaxation using CPLEX'), but it does not provide specific version numbers for CPLEX or any other software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | Both LR and APOC require a cost for creating clusters. We obtain this value as α median(dij), i.e. the median of all distances multiplied by a scaling factor α which typically is in the range [1, 30]. The initial centroids required by k-means-are found using k-means++ [14]. In Table 1 we show results for LR with values of cost scaling factor α = {5, 15, 25}, APOC with α = 15 and k-means-with k = {10, 40}. |