reproducibilityindex.ai

Large Scale Sparse Clustering

Authors: Ruqi Zhang, Zhiwu Lu

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm.
Researcher Affiliation	Academia	Ruqi Zhang and Zhiwu Lu Beijing Key Laboratory of Big Data Management and Analysis Methods School of Information, Renmin University of China, Beijing 100872, China
Pseudocode	Yes	Algorithm 1 Large-Scale Sparse Clustering (LSSC)
Open Source Code	No	The paper mentions code availability for a compared method (Nyström) but does not provide concrete access or an explicit statement about the open-sourcing of the LSSC algorithm's code.
Open Datasets	Yes	We further evaluate our LSSC algorithm on two real-world datasets from the Yann Le Cun s homepage3 and the UCI repository4. Their statistical characteristics are listed in Table 1, and below is a brief description of each dataset: MNIST: a dataset of handwritten digits, and each digit is represented using 784 features. Covtype: a dataset to predict forest cover type from cartographic variables only, originally with 54 features. 3http://yann.lecun.com/exdb/mnist/ 4http://archive.ics.uci.edu/ml
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment setup. It discusses evaluating on datasets with added noise, but not specific data partitioning.
Hardware Specification	Yes	all the methods are implemented in MATLAB R2014a and run on a 3.40 GHz, 32GB RAM Core 2 Duo PC.
Software Dependencies	Yes	all the methods are implemented in MATLAB R2014a
Experiment Setup	Yes	In the experiments, we produce new noisy datasets by adding two types of noise (uniform noise and Gaussian noise) of different levels (i.e. 0%, 15%, and 30%) to the original datasets. We find that our LSSC algorithm is not sensitive to λ in our experiments, and thus ﬁx this parameter at λ = 0.01 for all the datasets. By considering a tradeoff of running efﬁciency and effectiveness, we uniformly set k = 1, 000 and empirically set r = 4, p = 13 for MNIST and r = 2, p = 9 for Covtype.