Recycling Randomness with Structure for Sublinear time Kernel Expansions

Authors: Krzysztof Choromanski, Vikas Sindhwani

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features. In this section, we compare feature maps obtained with fully Gaussian, Fastfood, Circulant, and Toeplitz-like matrices with increasing displacement rank. Our goal is to lend support to the theoretical contributions of this paper by showing that high-quality feature maps can be constructed from a broad class of structured matrices as instantiations of the proposed P-model. Kernel Approximation Quality: In Figure 5, we report relative Frobenius error in reconstructing the Gram matrix, i.e. K K fro K fro where K, K denote the exact and approximate Gram matrices, as a function of the number of random features. We use the g50c dataset which comprises of 550 examples drawn from multivariate Gaussians in 50-dimensional space with means separated such that the Bayes error is 5%. Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel).
Researcher Affiliation Industry Krzysztof Choromanski KCHORO@GOOGLE.COM Vikas Sindhwani SINDHWANI@GOOGLE.COM
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We use the g50c dataset which comprises of 550 examples drawn from multivariate Gaussians in 50-dimensional space with means separated such that the Bayes error is 5%. Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel). USPS (k=256), DNA (k=80), COIL (k=1024) are used as dataset names.
Dataset Splits No The paper mentions datasets used for evaluation but does not specify exact training/validation/test dataset splits.
Hardware Specification Yes Figure 3 shows the speedup obtained in featuremap construction time using structured matrices relative to using unstructured Gaussian random matrices (on a 6-core 32-GB Intel(R) Xeon(R) machine running Matlab R2014a).
Software Dependencies Yes Figure 3 shows the speedup obtained in featuremap construction time using structured matrices relative to using unstructured Gaussian random matrices (on a 6-core 32-GB Intel(R) Xeon(R) machine running Matlab R2014a).
Experiment Setup Yes Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel). In all experiments, for Toeplitz-like matrices, we used skew-Circulant parameters (the h vectors in Eqn. 5) with average sparsity of 5.