I/O Complexity of Attention, or How Optimal is FlashAttention?
Authors: Barna Saha, Christopher Ye
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We resolve the above question in its full generality by showing an I/O complexity lower bound that matches the upper bound provided by Flash Attention for any values of M d2 within any constant factors. Moreover, our lower bounds do not rely on using combinatorial matrix multiplication for computing the attention matrix: even if one uses fast matrix multiplication, the above I/O complexity bounds cannot be improved. Further, we give a better algorithm with lower I/O complexity for M < d2, and show that it is optimal for combinatorial algorithms. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA. |
| Pseudocode | Yes | Algorithm 1 SQUARETILINGATTENTION(Q, K, V, M) |
| Open Source Code | No | The paper does not contain a statement or link indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not use or reference any publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not specify any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe experimental setup details such as hyperparameters or training configurations. |