reproducibilityindex.ai

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

Authors: Wanyun Cui, Qianle Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of Cherry Q. Cherry Q outperforms existing quantization approaches in terms of perplexity and downstream task performance.
Researcher Affiliation	Academia	Wanyun Cui* , Qianle Wang Shanghai University of Finance and Economics Mo E Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics cui.wanyun@sufe.edu.cn, wql20000111@stu.sufe.edu.cn
Pseudocode	Yes	Algorithm 1 Cherry Q
Open Source Code	Yes	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have attached the codes in the submission.
Open Datasets	Yes	For the quantization of the base LLMs, we follow [9] to use C4 [20] as the training data. We selected the first four partitions of C4 and chose data with a length of 2048 tokens, resulting in a total of 50k samples of 2048 tokens. For the chat LLMs, since Vicuna-1.5 [5] is obtained by supervised fine-tuning based on Share GPT [5], we also use the Share GPT dataset for training.
Dataset Splits	Yes	We selected the first four partitions of C4 and chose data with a length of 2048 tokens, resulting in a total of 50k samples of 2048 tokens.
Hardware Specification	Yes	For all LLM scales (7B, 13B), and both base models and chat models (LLa MA2, Vicuna-v1.5), we train the models on a single node with 8 x A100 80Gi B GPUs.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers, such as Python or PyTorch versions, that are needed to replicate the experiment.
Experiment Setup	Yes	We use a total batch size of 128, a learning rate of 2e-5, a weight decay of 0.0, a cosine scheduler with 5% warm-up steps. The final learning rate is 25% of the peak learning rate for 2/3-bit LLMs, 10% for 4-bit LLMs. We train 1 epoch on base models, 2 epochs on chat models.