Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
Authors: Aaron J Havens, Alexandre Araujo, Huan Zhang, Bin Hu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings by computing non-vacuous certified ℓ2-robustness for vision transformers on CIFAR-10 and SVHN datasets. |
| Researcher Affiliation | Academia | 1 ECE & CSL, University of Illinois Urbana-Champaign 2 ECE, New York University. |
| Pseudocode | No | The paper provides theoretical analyses and mathematical formulations but does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code for Lo FAST is available at https: //github.com/Aaron Havens/Lo FAST. |
| Open Datasets | Yes | CIFAR-10 and SVHN datasets. |
| Dataset Splits | No | The paper references CIFAR-10 and SVHN datasets and mentions a 'subset of 1000 samples' for verification, but does not provide explicit train/validation/test split percentages, sample counts, or citations to predefined splits for reproducibility. |
| Hardware Specification | No | average wall-clock time in seconds per sample on our local machine. |
| Software Dependencies | No | Auto Li RPA supports ℓ2 perturbation models and has in the past been used for robustness certification of dot-product attention (Shi et al., 2020). |
| Experiment Setup | Yes | We use a standard Vi T architecture with residual attention and feed-forward blocks and a patch-size of 16. Additionally, Layer Project with R = 1, is applied before each attention head so that the spectral norm of the entire input X is controlled. We study the effect of different Vi T architecture parameters such as the number of attention heads, number of layers, and Lipschitz constant constraint of the attention weights. For these Vi T models trained on CIFAR-10 with layers l {3, 4, 5}, we examine the certified robust accuracy for ℓ2 perturbation sizes ϵ {0.02, 0.05, 0.1}. |