reproducibilityindex.ai

Randomized Clustered Nystrom for Large-Scale Kernel Machines

Authors: Farhad Pourkamali-Anaraki, Stephen Becker, Michael Wakin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Moreover, numerical experiments on classiﬁcation and regression tasks demonstrate the superior performance and efﬁciency of our proposed method compared with existing approaches. In this section, we present experimental results comparing our Randomized Clustered Nystr om (Algorithm 1) with state-of-the-art methods.
Researcher Affiliation	Academia	Farhad Pourkamali-Anaraki University of Colorado Boulder farhad.pourkamali@colorado.edu Stephen Becker University of Colorado Boulder stephen.becker@colorado.edu Michael B. Wakin Colorado School of Mines mwakin@mines.edu
Pseudocode	Yes	Algorithm 1 Randomized Clustered Nystr om Input: data set X, number of landmark points m, sketching dimension p < p Output: landmark points Z 1: Generate a random sign matrix H Rp p as in (10) 2: Compute X = HX Rp n 3: Perform K-means clustering on X = [ x1, . . . , xn] to get Sopt 4: Compute the sample mean in the original space, cf. (11) 5: Z = [z1, . . . , zm] Rp m
Open Source Code	No	The paper states: "Our proposed approach is implemented in MATLAB with the C/mex implementation for computing the sample mean." However, it does not provide any link or explicit statement about the public availability of this code.
Open Datasets	Yes	We examine the quality and generalization performance of the kernel approximation methods on classiﬁcation and regression tasks using three benchmark high-dimensional data sets from the LIBSVM archive (Chang and Lin 2011): svhn: p = 3,072 and n = 60,000 rcv1-binary: p = 47,236 and n = 20,242 E2006-tfidf: p = 150,360 and n = 6,000
Dataset Splits	No	The paper specifies training and testing splits (e.g., "we randomly sample ntrain = 0.8n data points... for training and the remaining ntest = 0.2n data points for testing") but does not explicitly mention or provide details for a separate validation split or cross-validation strategy used in their experiments.
Hardware Specification	No	The paper mentions: "In our experiments, we use Intel MKL BLAS version 11.2.3 which is bundled with MATLAB". However, it does not provide specific details on the CPU, GPU models, or other hardware specifications used for running the experiments.
Software Dependencies	Yes	Our proposed approach is implemented in MATLAB with the C/mex implementation for computing the sample mean. To perform K-means clustering, we use MATLAB s built-in function kmeans and the maximum number of iterations is set to 10. In our experiments, we use Intel MKL BLAS version 11.2.3 which is bundled with MATLAB
Experiment Setup	Yes	To perform K-means clustering, we use MATLAB s built-in function kmeans and the maximum number of iterations is set to 10. In all experiments, based on (Zhang and Kwok 2010), the Gaussian kernel function κ (xi, xj) = exp xi xj 2 2/c is used with the parameter c chosen as the averaged squared distances between all the data points and sample mean. We set the rank parameter r = 20, regularization λ = 2 4, and p = 20. with the ﬁxed parameter r = 20 and two values of p = 20 and p = 100.