Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Authors: Aaron J Havens, Alexandre Araujo, Huan Zhang, Bin Hu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our theoretical findings by computing non-vacuous certified ℓ2-robustness for vision transformers on CIFAR-10 and SVHN datasets.
Researcher Affiliation Academia 1 ECE & CSL, University of Illinois Urbana-Champaign 2 ECE, New York University.
Pseudocode No The paper provides theoretical analyses and mathematical formulations but does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code for Lo FAST is available at https: //github.com/Aaron Havens/Lo FAST.
Open Datasets Yes CIFAR-10 and SVHN datasets.
Dataset Splits No The paper references CIFAR-10 and SVHN datasets and mentions a 'subset of 1000 samples' for verification, but does not provide explicit train/validation/test split percentages, sample counts, or citations to predefined splits for reproducibility.
Hardware Specification No average wall-clock time in seconds per sample on our local machine.
Software Dependencies No Auto Li RPA supports ℓ2 perturbation models and has in the past been used for robustness certification of dot-product attention (Shi et al., 2020).
Experiment Setup Yes We use a standard Vi T architecture with residual attention and feed-forward blocks and a patch-size of 16. Additionally, Layer Project with R = 1, is applied before each attention head so that the spectral norm of the entire input X is controlled. We study the effect of different Vi T architecture parameters such as the number of attention heads, number of layers, and Lipschitz constant constraint of the attention weights. For these Vi T models trained on CIFAR-10 with layers l {3, 4, 5}, we examine the certified robust accuracy for ℓ2 perturbation sizes ϵ {0.02, 0.05, 0.1}.