Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes

Authors: Yingyi Chen, Qinghua Tao, Francesco Tonin, Johan Suykens

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks.
Researcher Affiliation Academia 1ESAT-STADIUS, KU Leuven, Belgium 2LIONS, EPFL, Switzerland (most of the work was done at ESAT-STADIUS, KU Leuven).
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code is at https://github.com/yingyichen-cyy/KEP-SVGP.
Open Datasets Yes We conduct empirical evaluations on benchmarks including i) computer vision: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009); ii) language modelling: IMDB sentiment analysis (Maas et al., 2011), Co LA linguistic acceptability prediction (Warstadt et al., 2019).
Dataset Splits Yes For both CIFAR-10, CIFAR-100, we randomly split the original training set into 90% training and 10% validation set, leading to a training set of 45K samples and a validation set of 5K. The test set is of 10K samples.
Hardware Specification Yes Comparisons of performance and efficiency on a single NVIDIA Tesla V100 SXM2 32 GB.
Software Dependencies No All experiments presented in this work are implemented with Py Torch.
Experiment Setup Yes For both CIFAR-10, CIFAR-100, we train 7-layer Vision Transformer (Vi T) (Dosovitskiy et al., 2021), optimized by Adam with batch size 128 and a cosine learning rate initialized with 10 3 for 300 epochs.