DiJiang: Efficient Large Language Models through Compact Kernelization
Authors: Hanting Chen, Liu Zhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with significantly reduced training costs and much faster inference speeds. Our Di Jiang-7B achieves comparable performance with LLa MA2-7B on various benchmark while requires only about 1/50 training cost. |
| Researcher Affiliation | Collaboration | Hanting Chen * 1 Zhicheng Liu * 1 Xutao Wang 1 Yuchuan Tian 2 Yunhe Wang 1 {chenhanting,yunhe.wang}@huawei.com; *Equal contribution 1Huawei Noah s Ark Lab 2Peking University. |
| Pseudocode | Yes | Algorithm 1 Frequency domain kernelization for efficient language models. |
| Open Source Code | Yes | Code is available at https: //github.com/Yuchuan Tian/Di Jiang. |
| Open Datasets | Yes | We opted to validate our method using Pythia (Biderman et al., 2023), a model with a fully public dataset and training procedure, enabling fair comparisons. ... utilized the Pile dataset. The Pile (Gao et al., 2020) is an 825 Gi B corpus of English text, specifically designed for training large-scale language models. |
| Dataset Splits | No | The paper mentions using Pythia and Pile datasets and adhering to Pythia's training settings, but does not explicitly provide specific dataset split percentages or counts for training, validation, and testing within its text. |
| Hardware Specification | Yes | Training time is measured using A800. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9). |
| Experiment Setup | No | The paper states, 'We adhered to the exact training settings employed by Pythia, including learning rates, optimizers, and other hyperparameters,' but does not provide the specific values for these settings within its main text. |