Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiJiang: Efficient Large Language Models through Compact Kernelization

Authors: Hanting Chen, Liu Zhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with significantly reduced training costs and much faster inference speeds. Our Di Jiang-7B achieves comparable performance with LLa MA2-7B on various benchmark while requires only about 1/50 training cost.
Researcher Affiliation Collaboration Hanting Chen * 1 Zhicheng Liu * 1 Xutao Wang 1 Yuchuan Tian 2 Yunhe Wang 1 EMAIL; *Equal contribution 1Huawei Noah s Ark Lab 2Peking University.
Pseudocode Yes Algorithm 1 Frequency domain kernelization for efficient language models.
Open Source Code Yes Code is available at https: //github.com/Yuchuan Tian/Di Jiang.
Open Datasets Yes We opted to validate our method using Pythia (Biderman et al., 2023), a model with a fully public dataset and training procedure, enabling fair comparisons. ... utilized the Pile dataset. The Pile (Gao et al., 2020) is an 825 Gi B corpus of English text, specifically designed for training large-scale language models.
Dataset Splits No The paper mentions using Pythia and Pile datasets and adhering to Pythia's training settings, but does not explicitly provide specific dataset split percentages or counts for training, validation, and testing within its text.
Hardware Specification Yes Training time is measured using A800.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9).
Experiment Setup No The paper states, 'We adhered to the exact training settings employed by Pythia, including learning rates, optimizers, and other hyperparameters,' but does not provide the specific values for these settings within its main text.