Multiple Kernel k-Means with Incomplete Kernels

Authors: Xinwang Liu, Miaomiao Li, Lei Wang, Yong Dou, Jianping Yin, En Zhu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on four benchmark data sets to compare the proposed algorithm with existing imputation-based methods. Our algorithm consistently achieves superior performance and the improvement becomes more significant with increasing missing ratio, verifying the effectiveness and advantages of the proposed joint imputation and clustering.
Researcher Affiliation Academia Xinwang Liu, Miaomiao Li School of Computer National University of Defense Technology Changsha, China, 410073 Lei Wang School of Computer Science and Software Engineering University of Wollongong NSW, Australia, 2522 Yong Dou, Jianping Yin, En Zhu School of Computer National University of Defense Technology Changsha, China, 410073
Pseudocode Yes Algorithm 1 Proposed Multiple Kernel k-means with Incomplete Kernels
Open Source Code No The Matlab codes of kernel k-means and MKKM are publicly downloaded from https://github.com/mehmetgonen/lmkkmeans. (This refers to baseline code, not the code for the method described in this paper.)
Open Datasets Yes The proposed algorithm is experimentally evaluated on four widely used MKL benchmark data sets shown in Table 1. They are Oxford Flower171, Oxford Flower1022, Columbia Consumer Video (CCV)3 and Caltech1024. 1http://www.robots.ox.ac.uk/ vgg/data/flowers/17/ 2http://www.robots.ox.ac.uk/ vgg/data/flowers/102/ 3http://www.ee.columbia.edu/ln/dvmm/CCV/ 4http://files.is.tue.mpg.de/pgehler/projects/iccv09/
Dataset Splits No The paper does not provide specific dataset split information for traditional training, validation, and testing sets. It describes how incomplete kernels are generated for evaluation purposes across different missing ratios, but not standard data splits for model development and evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions "The Matlab codes of kernel k-means and MKKM are publicly downloaded..." but does not specify a Matlab version or any other software dependencies with version numbers.
Experiment Setup Yes For all data sets, it is assumed that the true number of clusters is known and it is set as the true number of classes. For CCV data set, we generate six base kernels by applying both a linear kernel and a Gaussian kernel on its SIFT, STIP and MFCC features, where the widths of the three Gaussian kernels are set as the mean of all pairwise sample distances, respectively. Following the literature (Cortes, Mohri, and Rostamizadeh 2012), all base kernels are centered and scaled so that we have κp(xi, xi) = 1 for all i and p. The parameter ε, termed missing ratio in this experiment, controls the percentage of samples that have absent views, and it affects the performance of the algorithms in comparison. Specifically, ε on all the four data sets is set as [0.1 : 0.1 : 0.9]. For all algorithms, we repeat each experiment for 50 times with random initialization to reduce the affect of randomness caused by k-means, and report the best result. Meanwhile, we randomly generate the incomplete patterns for 30 times in the abovementioned way and report the statistical results.