Recycling Randomness with Structure for Sublinear time Kernel Expansions
Authors: Krzysztof Choromanski, Vikas Sindhwani
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features. In this section, we compare feature maps obtained with fully Gaussian, Fastfood, Circulant, and Toeplitz-like matrices with increasing displacement rank. Our goal is to lend support to the theoretical contributions of this paper by showing that high-quality feature maps can be constructed from a broad class of structured matrices as instantiations of the proposed P-model. Kernel Approximation Quality: In Figure 5, we report relative Frobenius error in reconstructing the Gram matrix, i.e. K K fro K fro where K, K denote the exact and approximate Gram matrices, as a function of the number of random features. We use the g50c dataset which comprises of 550 examples drawn from multivariate Gaussians in 50-dimensional space with means separated such that the Bayes error is 5%. Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel). |
| Researcher Affiliation | Industry | Krzysztof Choromanski KCHORO@GOOGLE.COM Vikas Sindhwani SINDHWANI@GOOGLE.COM |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use the g50c dataset which comprises of 550 examples drawn from multivariate Gaussians in 50-dimensional space with means separated such that the Bayes error is 5%. Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel). USPS (k=256), DNA (k=80), COIL (k=1024) are used as dataset names. |
| Dataset Splits | No | The paper mentions datasets used for evaluation but does not specify exact training/validation/test dataset splits. |
| Hardware Specification | Yes | Figure 3 shows the speedup obtained in featuremap construction time using structured matrices relative to using unstructured Gaussian random matrices (on a 6-core 32-GB Intel(R) Xeon(R) machine running Matlab R2014a). |
| Software Dependencies | Yes | Figure 3 shows the speedup obtained in featuremap construction time using structured matrices relative to using unstructured Gaussian random matrices (on a 6-core 32-GB Intel(R) Xeon(R) machine running Matlab R2014a). |
| Experiment Setup | Yes | Results on publicly available real-world classification datasets, averaged over 100 runs, are reported in Table 1 for complex exponential nonlinearity (Gaussian kernel). In all experiments, for Toeplitz-like matrices, we used skew-Circulant parameters (the h vectors in Eqn. 5) with average sparsity of 5. |