Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scaling Neural Tangent Kernels via Sketching and Random Features
Authors: Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo Shin
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150 speedup. |
| Researcher Affiliation | Academia | Amir Zandieh Max-Planck-Institut für Informatik EMAIL Insu Han Yale University EMAIL Haim Avron Tel Aviv University EMAIL Neta Shoham Tel Aviv University EMAIL Chaewon Kim KAIST EMAIL Jinwoo Shin KAIST EMAIL |
| Pseudocode | Yes | Algorithm 1 NTKSKETCH for fully-connected Re LU networks ... Algorithm 2 Random Features for Re LU NTK via POLYSKETCH |
| Open Source Code | Yes | Codes are available at https://github.com/insuhan/ntk-sketch-rf. |
| Open Datasets | Yes | We first benchmark our proposed NTK approximation algorithms on MNIST [25] dataset and compare against gradient-based NTK random features [5] (GRADRF) as a baseline method. Next we test our CNTKSKETCH on CIFAR-10 dataset [24]. We also demonstrate the computational efficiency of our NTKSKETCH and NTKRF using 4 largescale UCI regression datasets [17]. |
| Dataset Splits | No | We search the ridge parameter with a random subset of training set and choose the one that achieves the best validation accuracy. The paper mentions using a "random subset of training set" for validation but does not provide specific details on the size or methodology of this subset for reproducibility. |
| Hardware Specification | Yes | We run experiments on a system with an Intel E5-2630 CPU with 256 GB RAM and a single Ge Force RTX 2080 GPUs with 12 GB RAM. |
| Software Dependencies | No | The paper mentions that "Codes are available at https://github.com/insuhan/ntk-sketch-rf" but does not explicitly list specific software dependencies with version numbers within the text. |
| Experiment Setup | Yes | We use the Re LU network with depth L = 1. We search the ridge parameter with a random subset of training set and choose the one that achieves the best validation accuracy. We choose a convolutional network of depth L = 3 and compare CNTKSKETCH and GRADRF for various feature dimensions. For our methods and RFF, we fix the output dimension to m = 8,192 for all datasets. |