Infinite attention: NNGP and NTK for deep attention networks

Authors: Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate attention kernels empirically, leading to a moderate improvement upon the previous state-of-the-art on CIFAR-10 for GPs without trainable kernels and advanced data preprocessing.
Researcher Affiliation Collaboration 1University of Cambridge. Work done while interning at Google Brain. 2Google Brain. Correspondence to: Jiri Hron <jh2084@cam.ac.uk>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Finally, since attention is often applied to text datasets, we release code allowing applications of NNGP/NTK models to variable-length sequences, including an example on the IMDb reviews dataset. Our implementation seamlessly extends the Neural Tangents library... The Neural Tangents library (Novak et al., 2020) is cited with URL: https://github.com/google/neural-tangents.
Open Datasets Yes We evaluate the attention NNGP/NTK kernels on the CIFAR-10 (Krizhevsky, 2009) and IMDb reviews (Maas et al., 2011) datasets.
Dataset Splits Yes The smaller scale experiments were run on a randomly selected subset of six thousand observations from the training set, with the 2K/4K train/validation split. Selected hyperparameters were then employed in the larger scale experiment with the usual 50K/10K train/test split. IMDb sentiment classification, test accuracies of simple NNGP/NTK models on the 25K/25K train/test split...
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper states that 'Our experimental code utilises the JAX (Bradbury et al., 2018) and Neural Tangents (Novak et al., 2020) libraries,' but does not specify version numbers for these software components or other dependencies like Python or CUDA versions.
Experiment Setup No The paper states that 'Exact details regarding data normalisation, hyperparameter tuning, and other experimental settings can be found in Appendix A.', thus deferring the specific details from the main text.