Fast Neural Kernel Embeddings for General Activations

Authors: Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform experiments with the proposed neural kernels based on our dual kernel approximation. All experiments run using a single A100 GPU machine. We first benchmark our algorithm to approximate the dual kernel matrix. We use Re LU, Abs (i.e., σ(t) = |t|), sin, Gaussian, erf and Ge LU activations and approximate them by their Hermite expansion where degree changes from q = 1 to 20. We randomly generate n = 1,000 of 256-dimensional inputs where each entry is i.i.d. drawn from N(0, 1/256). We also compare our approach to the Monte Carlo estimation of dual kernel, i.e., Kσ(x, y) ≈ 1 m Pm i=1 σ( wi, x )σ( wi, y ) where {wi}m i=1 are i.i.d. standard Gaussian vectors. In Figure 1, we plot relative errors of the Frobenius norm of kernel approximations in terms of wall-clock times (top) and polynomial degree (bottom).
Researcher Affiliation Collaboration Insu Han1 Amir Zandieh2 Jaehoon Lee3 Roman Novak3 Lechao Xiao3 Amin Karbasi1,3 1Yale University 2Max-Planck-Institut für Informatik 3Google Research
Pseudocode Yes Algorithm 1 Subspace Embedding of Homogeneous NNGP and NTK
Open Source Code Yes We open-source NNGP and NTK for new activations within the Neural Tangents library [42] and sketching algorithm at https://github.com/insuhan/ntk_activations.
Open Datasets Yes Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves 106 speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.
Dataset Splits No The paper reports selecting 'the best test accuracy among 20 choices of ridge parameters' which implies hyperparameter tuning, but it does not specify a distinct validation dataset split. It doesn't mention explicit percentages or sample counts for a validation set.
Hardware Specification Yes All experiments run using a single A100 GPU machine.
Software Dependencies No The paper mentions 'Neural Tangents library [42]' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes We extract CNTK features of a 5-layer convolutional neural network (known as Myrtle5 [54]) without pooling by setting degree q = 8 and explore feature dimension m = {29, . . . , 214} and homogeneous dual kernels including Re LU, ABRe LU activations as well as deep normalized Gaussian kernels with 2 scaling factors.