Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast Attention Requires Bounded Entries
Authors: Josh Alman, Zhao Song
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present two results, showing that there is a sharp transition at B = Θ( log n). If d = O(log n) and B = o( log n), there is an n1+o(1) time algorithm to approximate Att(Q, K, V ) up to 1/poly(n) additive error. If d = O(log n) and B = Θ( log n), assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory, it is impossible to approximate Att(Q, K, V ) up to 1/poly(n) additive error in truly subquadratic time n2 Ω(1). |
| Researcher Affiliation | Collaboration | EMAIL. Columbia University. EMAIL. Adobe Research. |
| Pseudocode | Yes | Algorithm 1 Our Algorithm |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | No | This is a theoretical paper that does not involve training models on datasets. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical validation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not discuss specific software dependencies or versions for implementation. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training configurations. |