Sparse Embedded $k$-Means Clustering
Authors: Weiwei Liu, Xiaobo Shen, Ivor Tsang
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate k-means clustering, while achieving satisfactory clustering performance. |
| Researcher Affiliation | Academia | School of Computer Science and Engineering, The University of New South Wales School of Computer Science and Engineering, Nanyang Technological University Centre for Artificial Intelligence, University of Technology Sydney {liuweiwei863,njust.shenxiaobo}@gmail.com ivor.tsang@uts.edu.au |
| Pseudocode | Yes | Algorithm 1 Sparse Embedded k-Means Clustering Input: X Rn d. Number of clusters k. Output: ϵ-approximate solution for problem 1. 1: Set d = O(max( k+log(1/δ) ϵ2 , 6 ϵ2δ)). 2: Build a random map h so that for any i [d], h(i) = j for j [ d] with probability 1/ d. 3: Construct matrix Φ {0, 1}d d with Φi,h(i) = 1, and all remaining entries 0. 4: Construct matrix Q Rd d is a random diagonal matrix whose entries are i.i.d. Rademacher variables. 5: Compute the product ˆX = XQΦ and run exact or approximate k-means algorithms on ˆX. |
| Open Source Code | No | The paper mentions using code from websites for baseline methods (LLE, LS, PD, k-means) but does not provide a link or statement about the availability of their own proposed method's source code. |
| Open Datasets | Yes | This section evaluates the performance of the proposed method on four real-world data sets: COIL20, SECTOR, RCV1 and ILSVRC2012. The COIL20 [20] and ILSVRC2012 [21] data sets are collected from website34, and other data sets are collected from the LIBSVM website5. 3http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php 4http://www.image-net.org/challenges/LSVRC/2012/ 5https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper evaluates performance on several datasets but does not explicitly detail training, validation, and test dataset splits, percentages, or sample counts for reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'standard k-means clustering package' and references code for baselines, but does not provide specific version numbers for any ancillary software dependencies (e.g., libraries, frameworks) used for their implementation. |
| Experiment Setup | No | The paper mentions running baseline methods 'with default parameters' but does not specify concrete hyperparameters, training configurations, or system-level settings for its own proposed method or the overall experimental setup. |