Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. |
| Researcher Affiliation | Collaboration | 1Tap Tap 2Open NLPLab, Shanghai AI Lab. Correspondence to: Yiran Zhong <zhongyiran@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Linear Attention Left Product |
| Open Source Code | Yes | The source code is released at github.com/Open NLPLab/Transnormer LLM. |
| Open Datasets | Yes | TNL records the lowest perplexity on test set after trained on the Wikitext-103 dataset. |
| Dataset Splits | Yes | Table 1. Results on Wikitext-103 (TNN(Qin et al., 2023a) s setting). means lower is better. Model PPL (val) PPL (test) Params (M) |
| Hardware Specification | Yes | All the experiments were conducted on A100 80G GPU clusters. |
| Software Dependencies | No | The paper mentions software components like "Metaseq framework," "Pytorch," and "Triton" but does not specify their version numbers, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. We also scaled up our model to 1B and 3B parameters and compared its training loss with top-tier LLM structures... all evaluation results being conducted with a 5-shot setup. |