reproducibilityindex.ai

DiJiang: Efficient Large Language Models through Compact Kernelization

Authors: Hanting Chen, Liu Zhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with signiﬁcantly reduced training costs and much faster inference speeds. Our Di Jiang-7B achieves comparable performance with LLa MA2-7B on various benchmark while requires only about 1/50 training cost.
Researcher Affiliation	Collaboration	Hanting Chen * 1 Zhicheng Liu * 1 Xutao Wang 1 Yuchuan Tian 2 Yunhe Wang 1 {chenhanting,yunhe.wang}@huawei.com; *Equal contribution 1Huawei Noah s Ark Lab 2Peking University.
Pseudocode	Yes	Algorithm 1 Frequency domain kernelization for efﬁcient language models.
Open Source Code	Yes	Code is available at https: //github.com/Yuchuan Tian/Di Jiang.
Open Datasets	Yes	We opted to validate our method using Pythia (Biderman et al., 2023), a model with a fully public dataset and training procedure, enabling fair comparisons. ... utilized the Pile dataset. The Pile (Gao et al., 2020) is an 825 Gi B corpus of English text, speciﬁcally designed for training large-scale language models.
Dataset Splits	No	The paper mentions using Pythia and Pile datasets and adhering to Pythia's training settings, but does not explicitly provide specific dataset split percentages or counts for training, validation, and testing within its text.
Hardware Specification	Yes	Training time is measured using A800.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9).
Experiment Setup	No	The paper states, 'We adhered to the exact training settings employed by Pythia, including learning rates, optimizers, and other hyperparameters,' but does not provide the specific values for these settings within its main text.