Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Authors: Wanyun Cui, Qianle Wang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Cherry Q. Cherry Q outperforms existing quantization approaches in terms of perplexity and downstream task performance. |
| Researcher Affiliation | Academia | Wanyun Cui* , Qianle Wang Shanghai University of Finance and Economics Mo E Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Cherry Q |
| Open Source Code | Yes | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have attached the codes in the submission. |
| Open Datasets | Yes | For the quantization of the base LLMs, we follow [9] to use C4 [20] as the training data. We selected the first four partitions of C4 and chose data with a length of 2048 tokens, resulting in a total of 50k samples of 2048 tokens. For the chat LLMs, since Vicuna-1.5 [5] is obtained by supervised fine-tuning based on Share GPT [5], we also use the Share GPT dataset for training. |
| Dataset Splits | Yes | We selected the first four partitions of C4 and chose data with a length of 2048 tokens, resulting in a total of 50k samples of 2048 tokens. |
| Hardware Specification | Yes | For all LLM scales (7B, 13B), and both base models and chat models (LLa MA2, Vicuna-v1.5), we train the models on a single node with 8 x A100 80Gi B GPUs. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers, such as Python or PyTorch versions, that are needed to replicate the experiment. |
| Experiment Setup | Yes | We use a total batch size of 128, a learning rate of 2e-5, a weight decay of 0.0, a cosine scheduler with 5% warm-up steps. The final learning rate is 25% of the peak learning rate for 2/3-bit LLMs, 10% for 4-bit LLMs. We train 1 epoch on base models, 2 epochs on chat models. |