Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
Authors: Yingyi Chen, Qinghua Tao, Francesco Tonin, Johan Suykens
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks. |
| Researcher Affiliation | Academia | 1ESAT-STADIUS, KU Leuven, Belgium 2LIONS, EPFL, Switzerland (most of the work was done at ESAT-STADIUS, KU Leuven). |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code is at https://github.com/yingyichen-cyy/KEP-SVGP. |
| Open Datasets | Yes | We conduct empirical evaluations on benchmarks including i) computer vision: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009); ii) language modelling: IMDB sentiment analysis (Maas et al., 2011), Co LA linguistic acceptability prediction (Warstadt et al., 2019). |
| Dataset Splits | Yes | For both CIFAR-10, CIFAR-100, we randomly split the original training set into 90% training and 10% validation set, leading to a training set of 45K samples and a validation set of 5K. The test set is of 10K samples. |
| Hardware Specification | Yes | Comparisons of performance and efficiency on a single NVIDIA Tesla V100 SXM2 32 GB. |
| Software Dependencies | No | All experiments presented in this work are implemented with Py Torch. |
| Experiment Setup | Yes | For both CIFAR-10, CIFAR-100, we train 7-layer Vision Transformer (Vi T) (Dosovitskiy et al., 2021), optimized by Adam with batch size 128 and a cosine learning rate initialized with 10 3 for 300 epochs. |