reproducibilityindex.ai

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths.
Researcher Affiliation	Collaboration	1Tap Tap 2Open NLPLab, Shanghai AI Lab. Correspondence to: Yiran Zhong <zhongyiran@gmail.com>.
Pseudocode	Yes	Algorithm 1 Linear Attention Left Product
Open Source Code	Yes	The source code is released at github.com/Open NLPLab/Transnormer LLM.
Open Datasets	Yes	TNL records the lowest perplexity on test set after trained on the Wikitext-103 dataset.
Dataset Splits	Yes	Table 1. Results on Wikitext-103 (TNN(Qin et al., 2023a) s setting). means lower is better. Model PPL (val) PPL (test) Params (M)
Hardware Specification	Yes	All the experiments were conducted on A100 80G GPU clusters.
Software Dependencies	No	The paper mentions software components like "Metaseq framework," "Pytorch," and "Triton" but does not specify their version numbers, which are necessary for full reproducibility.
Experiment Setup	Yes	We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. We also scaled up our model to 1B and 3B parameters and compared its training loss with top-tier LLM structures... all evaluation results being conducted with a 5-shot setup.