Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut
Authors: Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithm compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively. |
| Researcher Affiliation | Academia | Shenfei Pei School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University shenfeipei@gmail.com Feiping Nie School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University feipingnie@gmail.com Rong Wang School of Cybersecurity and Center for OPTIMAL Northwestern Polytechnical University wangrong07@tsinghua.org.cn Xuelong Li School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University li@nwpu.edu.cn |
| Pseudocode | Yes | Algorithm 1: An efficient program for solving problem (21). |
| Open Source Code | Yes | In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively [Git Hub]. |
| Open Datasets | Yes | Web Face [50] and Celeb A [23] are two large-scale public datasets available for face recognition and verification problems. CALFW [54] and CPLFW [53] are two variants of LFW aiming at cross-age and cross-pose face recognition, respectively. CACD [5], Adience [15], and FERET [35] are constructed for cross-age face retrieval, age and gender recognition, and facial recognition system evaluation. |
| Dataset Splits | No | The paper does not explicitly provide details about validation dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | Both k-means and our code run on the Arch machine with 3.20 GHz i7-8700 CPU, 32 GB main memory. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn', 'C++', 'Dlib', and 'EFANNA', but it does not specify exact version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The number of nearest neighbors k is fixed at 20 for 6 synthetic and 12 middle-scale real world datasets. The k-nearest neighbors graphs are generated by EFANNA [14] with k = 100 for all facial datasets. Every method takes 50 runs. The average results are reported. |