Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models
Authors: Luohe Shi, Yao Yao, Zuchao Li, Lefei Zhang, Hai Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations on various LLMs using different benchmarks demonstrate that RTD establishes a new paradigm for augmenting models to downstream tasks. Furthermore, our method exhibits strong orthogonality with traditional methods, allowing for concurrent usage. |
| Researcher Affiliation | Academia | National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, P. R. China 2Department of Computer Science and Engineering, Shanghai Jiao Tong University |
| Pseudocode | No | The paper does not include a section or figure explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured code-like blocks. |
| Open Source Code | Yes | Our code can be found at https://github.com/Shi Luohe/Reference Trustable Decoding |
| Open Datasets | Yes | Testing benchmarks are: Massive Multitask Language Understanding (MMLU) [15], AI2 Reasoning Challenge (ARC, both Easy (E) and Challenge (C) parts) [4], Reasoning about Physical Commonsense in Natural Language (PIQA) [5], Open Book Question Answering (OBQA) [30], and Massive Multitask Language Understanding in Chinese (CMMLU) [25]. [...] To generate reference datastores, LLMs are shown to the questions and options in the training split of the benchmarks and we store the attention output. |
| Dataset Splits | No | The paper mentions 'training split' and 'test set' explicitly but does not specify a distinct validation set with quantitative details such as percentages or sample counts for the main experiments. Appendix D mentions 'Max Seq. Len. 4096' for Lo RA tuning, but does not specify validation split. |
| Hardware Specification | Yes | All testing are done on a server with 8*A100 80G SXM. For models with less than 15B parameters, 2 of 8 GPUs are used. For models with more than 15B parameters, 4 of 8 GPUs are used. |
| Software Dependencies | No | All testing are carried out under Hugging Face Transformers library [43]. While the software is mentioned, a specific version number for the 'Hugging Face Transformers library' or any other key software component is not provided. |
| Experiment Setup | Yes | If not tuned, we set k = 1024, s L = 19, 828, λ = 1 and T = 750 by default. [...] The hyperparameters of Lo RA are in Appendix D. Table 11: Lo RA Hyper-parameters: Batch Size 4, Epochs 2, Max Seq. Len. 4096, Lo RA Target {Q, K, V, O, Up, Down, Gate}_proj, Lo RA Rank 16, Lo RA α 32, Lo RA dropout 0.01, Learning Rate 1e-5, Optimizer Adam W Adma RMS ϵ 2e-4, Adam β (0.9, 0.999), Adam Weight Decay 0.01, Scheduler Constant LR. |