Scalable Neural Network Kernels
Authors: Arijit Sehanobish, Krzysztof Marcin Choromanski, YUNFAN ZHAO, Kumar Avinava Dubey, Valerii Likhosherstov
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide rigorous theoretical analysis of all these concepts as well as an extensive empirical evaluation, ranging from point-wise kernel estimation to Transformers fine-tuning with novel adapter layers inspired by SNNKs. |
| Researcher Affiliation | Collaboration | Arijit Sehanobish Independent Researcher Krzysztof Choromanski Google Deep Mind & Columbia University Yunfan Zhao Harvard University Avinava Dubey Google Research Valerii Likhosherstov Waymo |
| Pseudocode | No | The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks. Procedures are described in mathematical notation and natural language within the text. |
| Open Source Code | Yes | The code is provided at https://github.com/arijitthegame/neural-network-kernels. |
| Open Datasets | Yes | We use three large UCI classification datasets Cover Type ( 510K points, dim = 54), HIGGS ( 11M points, dim = 28) and HEPMASS ( 11M points, dim = 28) to evaluate SNNK. |
| Dataset Splits | Yes | For these datasets, we use a 25% stratified random sampling from the training set to create a validation set which is used to select the best model to run on the holdout set. |
| Hardware Specification | Yes | All experiments on the smaller dataset are run on a Google Colab free version (T4 GPU), while experiments on the larger datasets used V100 GPU and 40Gb A100 GPUs in Google Colab. |
| Software Dependencies | No | The paper mentions software used, such as "Transformers (Wolf et al., 2020) and adapter Transformer library (Pfeiffer et al., 2020)", but does not specify exact version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For these experiments, we use a learning rate of 1e 3, batch size of 64 and a constant scheduler with warmup steps 6% of the total number of training steps with Adam W optimizer (Loshchilov & Hutter, 2019). |