reproducibilityindex.ai

Scalable Neural Network Kernels

Authors: Arijit Sehanobish, Krzysztof Marcin Choromanski, YUNFAN ZHAO, Kumar Avinava Dubey, Valerii Likhosherstov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide rigorous theoretical analysis of all these concepts as well as an extensive empirical evaluation, ranging from point-wise kernel estimation to Transformers fine-tuning with novel adapter layers inspired by SNNKs.
Researcher Affiliation	Collaboration	Arijit Sehanobish Independent Researcher Krzysztof Choromanski Google Deep Mind & Columbia University Yunfan Zhao Harvard University Avinava Dubey Google Research Valerii Likhosherstov Waymo
Pseudocode	No	The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks. Procedures are described in mathematical notation and natural language within the text.
Open Source Code	Yes	The code is provided at https://github.com/arijitthegame/neural-network-kernels.
Open Datasets	Yes	We use three large UCI classification datasets Cover Type ( 510K points, dim = 54), HIGGS ( 11M points, dim = 28) and HEPMASS ( 11M points, dim = 28) to evaluate SNNK.
Dataset Splits	Yes	For these datasets, we use a 25% stratified random sampling from the training set to create a validation set which is used to select the best model to run on the holdout set.
Hardware Specification	Yes	All experiments on the smaller dataset are run on a Google Colab free version (T4 GPU), while experiments on the larger datasets used V100 GPU and 40Gb A100 GPUs in Google Colab.
Software Dependencies	No	The paper mentions software used, such as "Transformers (Wolf et al., 2020) and adapter Transformer library (Pfeiffer et al., 2020)", but does not specify exact version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For these experiments, we use a learning rate of 1e 3, batch size of 64 and a constant scheduler with warmup steps 6% of the total number of training steps with Adam W optimizer (Loshchilov & Hutter, 2019).