Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. |
| Researcher Affiliation | Collaboration | 1Tap Tap 2Open NLPLab, Shanghai AI Lab. Correspondence to: Yiran Zhong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Linear Attention Left Product |
| Open Source Code | Yes | The source code is released at github.com/Open NLPLab/Transnormer LLM. |
| Open Datasets | Yes | TNL records the lowest perplexity on test set after trained on the Wikitext-103 dataset. |
| Dataset Splits | Yes | Table 1. Results on Wikitext-103 (TNN(Qin et al., 2023a) s setting). means lower is better. Model PPL (val) PPL (test) Params (M) |
| Hardware Specification | Yes | All the experiments were conducted on A100 80G GPU clusters. |
| Software Dependencies | No | The paper mentions software components like "Metaseq framework," "Pytorch," and "Triton" but does not specify their version numbers, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. We also scaled up our model to 1B and 3B parameters and compared its training loss with top-tier LLM structures... all evaluation results being conducted with a 5-shot setup. |