Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flash Invariant Point Attention

Authors: Andrew Liu, Axel Elaldi, Nicholas Franklin, Nathan Russell, Gurinder S. Atwal, Yih-En Ban, Olivia Viessmann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that Flash IPA exceeds the validation performance of IPA in benchmarking models and datasets. We then demonstrate the memory and compute time efficiency by retraining benchmarking models more efficiently and without length restrictions on the data. 3 Experiments
Researcher Affiliation	Industry	Flagship Pioneering Cambridge, MA, United States [anliu,aelaldi,nfranklin,nrussell,matwal,aban,oviessmann]@flagshippioneering.com
Pseudocode	Yes	We provide the Flash-Attention pseudo-code in appendix A.3. Building on Flash Attention-1, Flash Attention-2 further reduced the number of non-matmul FLOPs, increased parallelism across thread blocks, and distributed work between warps to reduce communication through shared memory [21]. ... Algorithm 1 Flash IPA with factorized pair representations ... Algorithm 2 Flash Attention-2 via online softmax (Dao 2024)
Open Source Code	Yes	Flash IPA is available at https://github.com/flagshippioneering/flash_ipa. ... We provide Flash IPA as an importable uv package at https://github.com/ flagshippioneering/flash_ipa with an API similar to existing repositories using IPA to facilitate drop-in usage.
Open Datasets	Yes	We reran the PDB data pre-processing pipeline by the authors, which yielded a total of 40,492 single-chain protein monomers for training. ... We ingested the same BGSU version 3.382 of the RNASolo2 dataset [24], comprising a total of 14,995 structures (see Appendix A.5).
Dataset Splits	No	The paper describes using a length cut-off for training data and mentions re-running preprocessing pipelines from other authors. For example: "The original training used a maximum length cut-off of 512 residues, which results in a 10% reduction of the training data to 36,600 structures." and "First we re-trained RNA-Frame Flow with original and Flash IPA on all structures within 40 to 150 residue (6,030 structures total), as proposed by the authors." However, it does not explicitly provide specific training/test/validation splits (e.g., percentages, counts, or explicit reference to predefined splits from a cited source).
Hardware Specification	Yes	All experiments were run on L40S GPU instances with 48 GB HBM memory. ... We match the training strategy of the original authors and train on this reduced dataset on 4 GPUs with DDP. ... a comparable training on a single GPU instance.
Software Dependencies	No	The paper mentions several tools and models used (e.g., Protein-MPNN [23], ESMFold [4], g RNAde [25], Rho Fold [26]) and refers to Flash Attention versions, but it does not specify version numbers for general software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Flash IPA hyperparameters were matched to the IPA parameters chosen by the original authors (embedding sizes, hidden dimensions, number of heads, etc.). ... we reduced Flash IPA hidden dimension to 128 and used 5 blocks (instead of 4) to match model parameter sizes (17.4M versus 17.1M, theirs versus ours). ... the original implementation heuristically chose an effective batch size according to the quadratic rule effbs = max round 500.000 n GP Us N2 , 1 , which kept GPU memory at approximately 90% throughout training. For Flash IPA we were able to achieve a linear effective batch size effbs = max round 20.000 n GP Us N , 1 ... All models were trained for a fixed compute time of 20 hours