Robust Multiple Kernel K-means Using L21-Norm

Authors: Liang Du, Peng Zhou, Lei Shi, Hanmo Wang, Mingyu Fan, Wenjian Wang, Yi-Dong Shen

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments well demonstrate the effectiveness of the proposed algorithms. Experimental results on benchmark data sets have shown that the proposed approaches achieve better clustering results in both the single kernel and multiple kernel learning settings.
Researcher Affiliation Academia 1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 2School of Computer and Information Technology, Shanxi University 3University of Chinese Academy of Sciences 4Institute of Intelligent System and Decision, Wenzhou University
Pseudocode Yes Algorithm 1 The algorithm of RMKKM
Open Source Code Yes 3For the purpose of reproducibility, we provide the code at https://github.com/csliangdu/RMKKM.
Open Datasets Yes We collect a variety of data sets, including 6 image data sets and 3 text corpora, most of which have been frequently used to evaluate the performance of different clustering algorithms. The statistics of these data sets are summarized in Table 1. (Table 1 lists: YALE, JAFFE, ORL, AR, COIL20, BA, TR11, TR41, TR45)
Dataset Splits No The paper states, “As suggested in [Yang et al., 2010], we independently repeat the experiments for 20 times with random initializations and report the best results corresponding to the best objective values.” This describes the experimental repetition and result selection but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation details) for reproducing data partitioning.
Hardware Specification No The paper does not contain any information about specific hardware used for running its experiments (e.g., CPU, GPU models, memory, or cloud instance types).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific libraries or solvers with their versions).
Experiment Setup Yes For the proposed method RMKKM, the parameter γ to control the kernel weight distribution is set to 0.3. In addition, the results of all these compared algorithms depend on the initialization. As suggested in [Yang et al., 2010], we independently repeat the experiments for 20 times with random initializations and report the best results corresponding to the best objective values. Following the similar strategy of other multiple kernel learning approaches, we apply 12 different kernel functions as basis for multiple kernel clustering. These kernels include, seven RBF kernels K(xi, xj) = exp( ||xi xj||2/2δ2) with δ = t D0, where D0 is the maximum distance between samples and t varies in the range of {0.01, 0.05, 0.1, 1, 10, 50, 100}, four polynomial kernels K(xi, xj) = (a+x T i xj)b with a = {0, 1} and b = {2, 4} and a cosine kernel K(xi, xj) = (x T i xj)/(||xi|| ||x||). Finally, all the kernels have been normalized through K(xi, xj) = K(xi, xj)/ p K(xi, xi)K(xj, xj) and then rescaled to [0, 1]. The number of clusters is set to the true number of classes for all the data sets and clustering algorithms.