Fast Attention Requires Bounded Entries
Authors: Josh Alman, Zhao Song
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present two results, showing that there is a sharp transition at B = Θ( log n). If d = O(log n) and B = o( log n), there is an n1+o(1) time algorithm to approximate Att(Q, K, V ) up to 1/poly(n) additive error. If d = O(log n) and B = Θ( log n), assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory, it is impossible to approximate Att(Q, K, V ) up to 1/poly(n) additive error in truly subquadratic time n2 Ω(1). |
| Researcher Affiliation | Collaboration | josh@cs.columbia.edu. Columbia University. zsong@adobe.com. Adobe Research. |
| Pseudocode | Yes | Algorithm 1 Our Algorithm |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | No | This is a theoretical paper that does not involve training models on datasets. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical validation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not discuss specific software dependencies or versions for implementation. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training configurations. |