Large Scale Sparse Clustering
Authors: Ruqi Zhang, Zhiwu Lu
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm. |
| Researcher Affiliation | Academia | Ruqi Zhang and Zhiwu Lu Beijing Key Laboratory of Big Data Management and Analysis Methods School of Information, Renmin University of China, Beijing 100872, China |
| Pseudocode | Yes | Algorithm 1 Large-Scale Sparse Clustering (LSSC) |
| Open Source Code | No | The paper mentions code availability for a compared method (Nyström) but does not provide concrete access or an explicit statement about the open-sourcing of the LSSC algorithm's code. |
| Open Datasets | Yes | We further evaluate our LSSC algorithm on two real-world datasets from the Yann Le Cun s homepage3 and the UCI repository4. Their statistical characteristics are listed in Table 1, and below is a brief description of each dataset: MNIST: a dataset of handwritten digits, and each digit is represented using 784 features. Covtype: a dataset to predict forest cover type from cartographic variables only, originally with 54 features. 3http://yann.lecun.com/exdb/mnist/ 4http://archive.ics.uci.edu/ml |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment setup. It discusses evaluating on datasets with added noise, but not specific data partitioning. |
| Hardware Specification | Yes | all the methods are implemented in MATLAB R2014a and run on a 3.40 GHz, 32GB RAM Core 2 Duo PC. |
| Software Dependencies | Yes | all the methods are implemented in MATLAB R2014a |
| Experiment Setup | Yes | In the experiments, we produce new noisy datasets by adding two types of noise (uniform noise and Gaussian noise) of different levels (i.e. 0%, 15%, and 30%) to the original datasets. We find that our LSSC algorithm is not sensitive to λ in our experiments, and thus fix this parameter at λ = 0.01 for all the datasets. By considering a tradeoff of running efficiency and effectiveness, we uniformly set k = 1, 000 and empirically set r = 4, p = 13 for MNIST and r = 2, p = 9 for Covtype. |