Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Authors: Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, Christopher Ré
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate that FLASHATTENTION speeds up model training and improves model quality by modeling longer context. We also benchmark the runtime and memory footprint of FLASHATTENTION and block-sparse FLASHATTENTION compared to prior attention implementations. ... 4 Experiments |
| Researcher Affiliation | Academia | Department of Computer Science, Stanford University Department of Computer Science and Engineering, University at Buffalo, SUNY |
| Pseudocode | Yes | Algorithm 0 Standard Attention Implementation |
| Open Source Code | Yes | We open-source FLASHATTENTION to make it easier to build on this primitive.1 FLASHATTENTION code is available at https://github.com/Hazy Research/flash-attention |
| Open Datasets | Yes | We train BERT-large (seq. length 512) ... GPT2 (seq. length 1K) on the large Open Webtext dataset [34]... long-range arena (LRA [83]) benchmark... MIMIC-III [49] and ECt HR [6, 7] datasets. |
| Dataset Splits | Yes | Appendix E includes plots of the validation perplexity throughout training, confirming that FLASHATTENTION is as numerically stable as the baselines and produces the same training / validation curves. |
| Hardware Specification | Yes | on one A100 GPU with 40 GB HBM |
| Software Dependencies | No | The paper mentions 'Py Torch' and states 'Our implementation uses Apex s FMHA code (https://github.com/NVIDIA/apex/tree/master/apex/contrib/csrc/fmha) as a starting point,' but it does not specify version numbers for any software dependencies like PyTorch, CUDA, or Apex itself. |
| Experiment Setup | No | The paper refers to 'Additional experiment details are in Appendix E.' and for LRA, 'We follow the implementation and experimental setting in Tay et al. [83]and Xiong et al. [94].' However, it does not provide specific hyperparameter values or system-level training settings within the main body of the paper. |