DiJiang: Efficient Large Language Models through Compact Kernelization

Authors: Hanting Chen, Liu Zhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with significantly reduced training costs and much faster inference speeds. Our Di Jiang-7B achieves comparable performance with LLa MA2-7B on various benchmark while requires only about 1/50 training cost.
Researcher Affiliation Collaboration Hanting Chen * 1 Zhicheng Liu * 1 Xutao Wang 1 Yuchuan Tian 2 Yunhe Wang 1 {chenhanting,yunhe.wang}@huawei.com; *Equal contribution 1Huawei Noah s Ark Lab 2Peking University.
Pseudocode Yes Algorithm 1 Frequency domain kernelization for efficient language models.
Open Source Code Yes Code is available at https: //github.com/Yuchuan Tian/Di Jiang.
Open Datasets Yes We opted to validate our method using Pythia (Biderman et al., 2023), a model with a fully public dataset and training procedure, enabling fair comparisons. ... utilized the Pile dataset. The Pile (Gao et al., 2020) is an 825 Gi B corpus of English text, specifically designed for training large-scale language models.
Dataset Splits No The paper mentions using Pythia and Pile datasets and adhering to Pythia's training settings, but does not explicitly provide specific dataset split percentages or counts for training, validation, and testing within its text.
Hardware Specification Yes Training time is measured using A800.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9).
Experiment Setup No The paper states, 'We adhered to the exact training settings employed by Pythia, including learning rates, optimizers, and other hyperparameters,' but does not provide the specific values for these settings within its main text.