Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ZeroS: ZeroโSum Linear Attention for Efficient Transformers
Authors: Jiecheng Lu, Xu Han, Yan Sun, Viresh Pati, Yubin Kim, Siddhartha Somani, Shihao Yang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate both linear-time Zero S and quadratic-time Zero S-SM on recent in-context learning benchmarks, along with experiments on NLP, image, and time series tasks. In all experiments, we directly replaced the multi-head attention module with Zero S under original benchmark settings, preserving all other components (MLP/GLU, embeddings, hyperparameters) to ensure strict alignment with previous standards. (Section 4, Experiments) |
| Researcher Affiliation | Collaboration | Jiecheng Lu1 ,Xu Han2,Yan Sun1,Viresh Pati1,Yubin Kim1,Siddhartha Somani1,Shihao Yang1 Georgia Institute of Technology1, Amazon Web Services2 EMAIL , EMAIL |
| Pseudocode | No | The paper describes the methodology and architecture through text and figures (e.g., Figure 1 illustrating the Zero S block and its components), and provides mathematical formulations for its components. However, it does not include a distinct block labeled 'Pseudocode' or 'Algorithm' with structured steps for the overall method. |
| Open Source Code | Yes | The code implementation is available at this link. (Abstract) |
| Open Datasets | Yes | We evaluate Zero S on the MAD benchmark [54], Reg Bench [57], Wiki Text-103 [60], Open Web Text2 [61], Weather [69], Solar [70], ETT [71]. All of these are well-known and cited public datasets. |
| Dataset Splits | Yes | We conduct language modeling on Wiki Text-103 following [60] s setup, with results in Table 2. We follow the setup of [56] for the MQAR task, which evaluates models ability to learn induction heads for in-context associative recall. We evaluate Zero S on Reg Bench [57] following the original experimental setup (Figure 2). Following the setup in [65], we evaluate Zero S on time series forecasting tasks. |
| Hardware Specification | Yes | Yes, all tasks used in this paper can be trained on the single Nvidia RTX 4090 GPU that we used. (NeurIPS Paper Checklist Q8) |
| Software Dependencies | No | The paper mentions using a 'code environment provided by nano GPT2' for the Open Web Text2 dataset training (Appendix A.5.5), but does not provide specific version numbers for software libraries, programming languages, or other key dependencies. |
| Experiment Setup | Yes | In all experiments, we directly replaced the multi-head attention module with Zero S under original benchmark settings, preserving all other components (MLP/GLU, embeddings, hyperparameters) to ensure strict alignment with previous standards. (Section 4) |