reproducibilityindex.ai

On Integrated Clustering and Outlier Detection

Authors: Lionel Ott, Linsey Pang, Fabio T Ramos, Sanjay Chawla

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluation on synthetic and real data sets attest to both the quality and scalability of our proposed method. Experiments on synthetic and real data sets are the focus of Section 5 before concluding with Section 6.
Researcher Affiliation	Academia	Lionel Ott University of Sydney lott4241@uni.sydney.edu.au Linsey Pang University of Sydney qlinsey@it.usyd.edu.au Fabio Ramos University of Sydney fabio.ramos@sydney.edu.au Sanjay Chawla University of Sydney sanjay.chawla@sydney.edu.au
Pseudocode	Yes	A high level algorithm description is given in Algorithm 1. Algorithm 1: Lagrangian Relaxation()
Open Source Code	No	The paper does not contain an explicit statement or link providing access to the source code for the described methodology.
Open Datasets	Yes	We use synthetic datasets for controlled performance evaluation and comparison between the different methods. Then, we present clustering and outlier results obtained on the MNIST image data set.
Dataset Splits	No	The paper mentions using synthetic datasets and the MNIST dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	No	The paper mentions memory usage differences ('APOC requires around 2200 MB of memory while LR only needs 370 MB') which implies execution on hardware, but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions comparing results with CPLEX ('solving the LP relaxation using CPLEX'), but it does not provide specific version numbers for CPLEX or any other software libraries or dependencies used in their implementation.
Experiment Setup	Yes	Both LR and APOC require a cost for creating clusters. We obtain this value as α median(dij), i.e. the median of all distances multiplied by a scaling factor α which typically is in the range [1, 30]. The initial centroids required by k-means-are found using k-means++ [14]. In Table 1 we show results for LR with values of cost scaling factor α = {5, 15, 25}, APOC with α = 15 and k-means-with k = {10, 40}.