Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means
Authors: Rong Yin, Yong Liu, Weiping Wang, Dan Meng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis. |
| Researcher Affiliation | Academia | Rong Yin 1,2, Yong Liu 3,4, , Weiping Wang 1,2, Dan Meng 1,2 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 4 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Unified Randomized Sketches Kernel k-Means |
| Open Source Code | No | We provide Pseudo code, data, and instructions. Lucky to be accepted, the code will be provided. |
| Open Datasets | Yes | The 9 real datasets: dna, segment, mushrooms, pendigits, protein, a8a, w7a, connect-4, and covtype, which are from LIBSVM website 2. 2http://www.csie.ntu.edu.cn/~cjlin/libsvm. |
| Dataset Splits | No | Generating 10,000 samples for training and 10,000 samples for testing. The number of training samples in each clustering is 10000/k. [...] 70 percent of the data in each dataset is used for training experiments, and the rest is used for testing. |
| Hardware Specification | Yes | The server is 32 cores (2.40GHz) and 32 GB of RAM. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | Each experiment is repeated 5 times. [...] The number of training samples in each clustering is 10000/k. [...] m = 150. The Gaussian kernel is exp x x 2/σ2 , where σ = q Pn . |