reproducibilityindex.ai

Fast Attention Requires Bounded Entries

Authors: Josh Alman, Zhao Song

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present two results, showing that there is a sharp transition at B = Θ( log n). If d = O(log n) and B = o( log n), there is an n1+o(1) time algorithm to approximate Att(Q, K, V ) up to 1/poly(n) additive error. If d = O(log n) and B = Θ( log n), assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory, it is impossible to approximate Att(Q, K, V ) up to 1/poly(n) additive error in truly subquadratic time n2 Ω(1).
Researcher Affiliation	Collaboration	josh@cs.columbia.edu. Columbia University. zsong@adobe.com. Adobe Research.
Pseudocode	Yes	Algorithm 1 Our Algorithm
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	No	This is a theoretical paper that does not involve training models on datasets.
Dataset Splits	No	This is a theoretical paper and does not involve empirical validation with dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe empirical experiments, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not discuss specific software dependencies or versions for implementation.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training configurations.