Large Scale Sparse Clustering

Authors: Ruqi Zhang, Zhiwu Lu

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm.
Researcher Affiliation Academia Ruqi Zhang and Zhiwu Lu Beijing Key Laboratory of Big Data Management and Analysis Methods School of Information, Renmin University of China, Beijing 100872, China
Pseudocode Yes Algorithm 1 Large-Scale Sparse Clustering (LSSC)
Open Source Code No The paper mentions code availability for a compared method (Nyström) but does not provide concrete access or an explicit statement about the open-sourcing of the LSSC algorithm's code.
Open Datasets Yes We further evaluate our LSSC algorithm on two real-world datasets from the Yann Le Cun s homepage3 and the UCI repository4. Their statistical characteristics are listed in Table 1, and below is a brief description of each dataset: MNIST: a dataset of handwritten digits, and each digit is represented using 784 features. Covtype: a dataset to predict forest cover type from cartographic variables only, originally with 54 features. 3http://yann.lecun.com/exdb/mnist/ 4http://archive.ics.uci.edu/ml
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment setup. It discusses evaluating on datasets with added noise, but not specific data partitioning.
Hardware Specification Yes all the methods are implemented in MATLAB R2014a and run on a 3.40 GHz, 32GB RAM Core 2 Duo PC.
Software Dependencies Yes all the methods are implemented in MATLAB R2014a
Experiment Setup Yes In the experiments, we produce new noisy datasets by adding two types of noise (uniform noise and Gaussian noise) of different levels (i.e. 0%, 15%, and 30%) to the original datasets. We find that our LSSC algorithm is not sensitive to λ in our experiments, and thus fix this parameter at λ = 0.01 for all the datasets. By considering a tradeoff of running efficiency and effectiveness, we uniformly set k = 1, 000 and empirically set r = 4, p = 13 for MNIST and r = 2, p = 9 for Covtype.