Randomized Clustered Nystrom for Large-Scale Kernel Machines

Authors: Farhad Pourkamali-Anaraki, Stephen Becker, Michael Wakin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, numerical experiments on classification and regression tasks demonstrate the superior performance and efficiency of our proposed method compared with existing approaches. In this section, we present experimental results comparing our Randomized Clustered Nystr om (Algorithm 1) with state-of-the-art methods.
Researcher Affiliation Academia Farhad Pourkamali-Anaraki University of Colorado Boulder farhad.pourkamali@colorado.edu Stephen Becker University of Colorado Boulder stephen.becker@colorado.edu Michael B. Wakin Colorado School of Mines mwakin@mines.edu
Pseudocode Yes Algorithm 1 Randomized Clustered Nystr om Input: data set X, number of landmark points m, sketching dimension p < p Output: landmark points Z 1: Generate a random sign matrix H Rp p as in (10) 2: Compute X = HX Rp n 3: Perform K-means clustering on X = [ x1, . . . , xn] to get Sopt 4: Compute the sample mean in the original space, cf. (11) 5: Z = [z1, . . . , zm] Rp m
Open Source Code No The paper states: "Our proposed approach is implemented in MATLAB with the C/mex implementation for computing the sample mean." However, it does not provide any link or explicit statement about the public availability of this code.
Open Datasets Yes We examine the quality and generalization performance of the kernel approximation methods on classification and regression tasks using three benchmark high-dimensional data sets from the LIBSVM archive (Chang and Lin 2011): svhn: p = 3,072 and n = 60,000 rcv1-binary: p = 47,236 and n = 20,242 E2006-tfidf: p = 150,360 and n = 6,000
Dataset Splits No The paper specifies training and testing splits (e.g., "we randomly sample ntrain = 0.8n data points... for training and the remaining ntest = 0.2n data points for testing") but does not explicitly mention or provide details for a separate validation split or cross-validation strategy used in their experiments.
Hardware Specification No The paper mentions: "In our experiments, we use Intel MKL BLAS version 11.2.3 which is bundled with MATLAB". However, it does not provide specific details on the CPU, GPU models, or other hardware specifications used for running the experiments.
Software Dependencies Yes Our proposed approach is implemented in MATLAB with the C/mex implementation for computing the sample mean. To perform K-means clustering, we use MATLAB s built-in function kmeans and the maximum number of iterations is set to 10. In our experiments, we use Intel MKL BLAS version 11.2.3 which is bundled with MATLAB
Experiment Setup Yes To perform K-means clustering, we use MATLAB s built-in function kmeans and the maximum number of iterations is set to 10. In all experiments, based on (Zhang and Kwok 2010), the Gaussian kernel function κ (xi, xj) = exp xi xj 2 2/c is used with the parameter c chosen as the averaged squared distances between all the data points and sample mean. We set the rank parameter r = 20, regularization λ = 2 4, and p = 20. with the fixed parameter r = 20 and two values of p = 20 and p = 100.