Scalable Neural Network Kernels

Authors: Arijit Sehanobish, Krzysztof Marcin Choromanski, YUNFAN ZHAO, Kumar Avinava Dubey, Valerii Likhosherstov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide rigorous theoretical analysis of all these concepts as well as an extensive empirical evaluation, ranging from point-wise kernel estimation to Transformers fine-tuning with novel adapter layers inspired by SNNKs.
Researcher Affiliation Collaboration Arijit Sehanobish Independent Researcher Krzysztof Choromanski Google Deep Mind & Columbia University Yunfan Zhao Harvard University Avinava Dubey Google Research Valerii Likhosherstov Waymo
Pseudocode No The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks. Procedures are described in mathematical notation and natural language within the text.
Open Source Code Yes The code is provided at https://github.com/arijitthegame/neural-network-kernels.
Open Datasets Yes We use three large UCI classification datasets Cover Type ( 510K points, dim = 54), HIGGS ( 11M points, dim = 28) and HEPMASS ( 11M points, dim = 28) to evaluate SNNK.
Dataset Splits Yes For these datasets, we use a 25% stratified random sampling from the training set to create a validation set which is used to select the best model to run on the holdout set.
Hardware Specification Yes All experiments on the smaller dataset are run on a Google Colab free version (T4 GPU), while experiments on the larger datasets used V100 GPU and 40Gb A100 GPUs in Google Colab.
Software Dependencies No The paper mentions software used, such as "Transformers (Wolf et al., 2020) and adapter Transformer library (Pfeiffer et al., 2020)", but does not specify exact version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For these experiments, we use a learning rate of 1e 3, batch size of 64 and a constant scheduler with warmup steps 6% of the total number of training steps with Adam W optimizer (Loshchilov & Hutter, 2019).