Computationally Efficient Nyström Approximation using Fast Transforms

Authors: Si Si, Cho-Jui Hsieh, Inderjit Dhillon

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we show the experimental results. Figure 3. Low-rank kernel approximation results. x-axis is the time and y axis shows the relative kernel approximation error. Table 3. Data set statistics. Table 4. Comparison of kernel SVM prediction on four real-world datasets.
Researcher Affiliation Academia Department of Computer Science, Unigersity of Texas at Austin; Departments of Statistics and Computer Science, University of California at Davis
Pseudocode Yes Algorithm 1 Fast Transforms for Nystr om Approximation
Open Source Code No The paper does not provide any explicit statement or link for open-source code availability for the methodology described.
Open Datasets Yes MNIST dataset with 60,000 samples; webspam data (more than 300,000 data points); Table 3. Data set statistics (n: number of samples; d: dimension of samples). Dataset n d USPS 9298 256 Covtype 581,012 54 a9a 48,842 123 MNIST 60,000 784 Letter 18,000 16 CIFAR 60,000 400 Epsilon 25,000 2,000 webspam 350,000 254
Dataset Splits No The paper mentions 'train', 'validation', and 'test' in Table 1 but does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, and testing sets used in the experiments. It refers to 'validation' in the context of the pseudo-inverse calculation, not as a dataset split for model evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes The degree p is set to be 3 in the experiment. In practice, the algorithm usually converges to a reasonably good solution in 10 iterations, so we fix the number of iterations to be 10 for all the experiments. To further improve the speed, in the experiments, we randomly sample 2000 data points to learn the seeds.