Fast Neural Kernel Embeddings for General Activations
Authors: Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform experiments with the proposed neural kernels based on our dual kernel approximation. All experiments run using a single A100 GPU machine. We first benchmark our algorithm to approximate the dual kernel matrix. We use Re LU, Abs (i.e., σ(t) = |t|), sin, Gaussian, erf and Ge LU activations and approximate them by their Hermite expansion where degree changes from q = 1 to 20. We randomly generate n = 1,000 of 256-dimensional inputs where each entry is i.i.d. drawn from N(0, 1/256). We also compare our approach to the Monte Carlo estimation of dual kernel, i.e., Kσ(x, y) ≈ 1 m Pm i=1 σ( wi, x )σ( wi, y ) where {wi}m i=1 are i.i.d. standard Gaussian vectors. In Figure 1, we plot relative errors of the Frobenius norm of kernel approximations in terms of wall-clock times (top) and polynomial degree (bottom). |
| Researcher Affiliation | Collaboration | Insu Han1 Amir Zandieh2 Jaehoon Lee3 Roman Novak3 Lechao Xiao3 Amin Karbasi1,3 1Yale University 2Max-Planck-Institut für Informatik 3Google Research |
| Pseudocode | Yes | Algorithm 1 Subspace Embedding of Homogeneous NNGP and NTK |
| Open Source Code | Yes | We open-source NNGP and NTK for new activations within the Neural Tangents library [42] and sketching algorithm at https://github.com/insuhan/ntk_activations. |
| Open Datasets | Yes | Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves 106 speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset. |
| Dataset Splits | No | The paper reports selecting 'the best test accuracy among 20 choices of ridge parameters' which implies hyperparameter tuning, but it does not specify a distinct validation dataset split. It doesn't mention explicit percentages or sample counts for a validation set. |
| Hardware Specification | Yes | All experiments run using a single A100 GPU machine. |
| Software Dependencies | No | The paper mentions 'Neural Tangents library [42]' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | We extract CNTK features of a 5-layer convolutional neural network (known as Myrtle5 [54]) without pooling by setting degree q = 8 and explore feature dimension m = {29, . . . , 214} and homogeneous dual kernels including Re LU, ABRe LU activations as well as deep normalized Gaussian kernels with 2 scaling factors. |