reproducibilityindex.ai

I/O Complexity of Attention, or How Optimal is FlashAttention?

Authors: Barna Saha, Christopher Ye

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We resolve the above question in its full generality by showing an I/O complexity lower bound that matches the upper bound provided by Flash Attention for any values of M d2 within any constant factors. Moreover, our lower bounds do not rely on using combinatorial matrix multiplication for computing the attention matrix: even if one uses fast matrix multiplication, the above I/O complexity bounds cannot be improved. Further, we give a better algorithm with lower I/O complexity for M < d2, and show that it is optimal for combinatorial algorithms.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA.
Pseudocode	Yes	Algorithm 1 SQUARETILINGATTENTION(Q, K, V, M)
Open Source Code	No	The paper does not contain a statement or link indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not use or reference any publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not describe dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not specify any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe experimental setup details such as hyperparameters or training configurations.