Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Strassen Attention, Split VC Dimension and Compositionality in Transformers

Authors: Alexander Kozachinskiy, Felipe Urrutia, Hector Orellana, Tomasz Steifer, Germán Pizarro, Matías Fuentes, Francisco Meza Vásquez, Cristian Buc Calderon, Cristobal Rojas

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To complement our theoretical findings, we experimentally studied Strassen attention and compared it against standard (Vaswani et al, 2017), higher-order attention (Sanford et al., 2023), and triangular attention (Bergen et al. 2021). Our results help to disentangle all these attention mechanisms, highlighting their strengths and limitations. In particular, Strassen attention outperforms standard attention significantly on all the tasks. Altogether, understanding the theoretical limitations can guide research towards scalable attention mechanisms that improve the reasoning abilities of Transformers.
Researcher Affiliation Academia Alexander Kozachinskiy CENIA EMAIL Felipe Urrutia University of Chile & CENIA EMAIL Hector Jimenez University of Chile & CENIA EMAIL Tomasz Steifer Institute of Fundamental Technological Research, Polish Academy of Sciences EMAIL Germán Pizarro CENIA EMAIL Matías Fuentes IMC, Pontifical Catholic University of Chile EMAIL Francisco Meza IMC, Pontifical Catholic University of Chile EMAIL Cristian B. Calderon CENIA EMAIL Cristóbal Rojas Institute for Mathematical and Computational Engineering Pontifical Catholic University of Chile & CENIA EMAIL
Pseudocode Yes Algorithm 1 outlines the data generation procedure. Algorithm 2 provides the detailed data generation procedure. Algorithm 3 outlines the data generation procedure. Algorithm 4 provides the detailed data generation procedure.
Open Source Code Yes Code for our experiments can be found at furrutiav/strassen-attention-neurips25.
Open Datasets Yes We create dedicated datasets to evaluate our models across all four tasks. Each task consists of 5 104 examples. Below, we detail the data generation process for each task, with explanations of key components and structures.
Dataset Splits Yes The dataset is split into a training and validation set. We randomly select 90% of the data for training, and the remaining 10% is used for validation.
Hardware Specification Yes We conduct all experiments on high-performance NVIDIA GPUs. Specifically, we execute on NVIDIA A100 GPUs with 80GB of memory tasks requiring extensive computational resources, such as Match3 and Quotient Binary Relation Composition. For tasks with lower computational demands, such as Function Composition and Binary Relation Composition, We use NVIDIA A40 GPUs with 48GB of memory.
Software Dependencies Yes We implement all models using Py Torch framework [1] with Opt-Einsum library [5].
Experiment Setup Yes Additionally, training parameters such as learning rate, batch size, and training duration equally specified for each task across model, as summarized in Table 6.