Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means

Authors: Rong Yin, Yong Liu, Weiping Wang, Dan Meng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.
Researcher Affiliation Academia Rong Yin 1,2, Yong Liu 3,4, , Weiping Wang 1,2, Dan Meng 1,2 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 4 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode Yes Algorithm 1 Unified Randomized Sketches Kernel k-Means
Open Source Code No We provide Pseudo code, data, and instructions. Lucky to be accepted, the code will be provided.
Open Datasets Yes The 9 real datasets: dna, segment, mushrooms, pendigits, protein, a8a, w7a, connect-4, and covtype, which are from LIBSVM website 2. 2http://www.csie.ntu.edu.cn/~cjlin/libsvm.
Dataset Splits No Generating 10,000 samples for training and 10,000 samples for testing. The number of training samples in each clustering is 10000/k. [...] 70 percent of the data in each dataset is used for training experiments, and the rest is used for testing.
Hardware Specification Yes The server is 32 cores (2.40GHz) and 32 GB of RAM.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes Each experiment is repeated 5 times. [...] The number of training samples in each clustering is 10000/k. [...] m = 150. The Gaussian kernel is exp x x 2/σ2 , where σ = q Pn .