reproducibilityindex.ai

Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models

Authors: Luohe Shi, Yao Yao, Zuchao Li, Lefei Zhang, Hai Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations on various LLMs using different benchmarks demonstrate that RTD establishes a new paradigm for augmenting models to downstream tasks. Furthermore, our method exhibits strong orthogonality with traditional methods, allowing for concurrent usage.
Researcher Affiliation	Academia	National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, P. R. China 2Department of Computer Science and Engineering, Shanghai Jiao Tong University
Pseudocode	No	The paper does not include a section or figure explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured code-like blocks.
Open Source Code	Yes	Our code can be found at https://github.com/Shi Luohe/Reference Trustable Decoding
Open Datasets	Yes	Testing benchmarks are: Massive Multitask Language Understanding (MMLU) [15], AI2 Reasoning Challenge (ARC, both Easy (E) and Challenge (C) parts) [4], Reasoning about Physical Commonsense in Natural Language (PIQA) [5], Open Book Question Answering (OBQA) [30], and Massive Multitask Language Understanding in Chinese (CMMLU) [25]. [...] To generate reference datastores, LLMs are shown to the questions and options in the training split of the benchmarks and we store the attention output.
Dataset Splits	No	The paper mentions 'training split' and 'test set' explicitly but does not specify a distinct validation set with quantitative details such as percentages or sample counts for the main experiments. Appendix D mentions 'Max Seq. Len. 4096' for Lo RA tuning, but does not specify validation split.
Hardware Specification	Yes	All testing are done on a server with 8*A100 80G SXM. For models with less than 15B parameters, 2 of 8 GPUs are used. For models with more than 15B parameters, 4 of 8 GPUs are used.
Software Dependencies	No	All testing are carried out under Hugging Face Transformers library [43]. While the software is mentioned, a specific version number for the 'Hugging Face Transformers library' or any other key software component is not provided.
Experiment Setup	Yes	If not tuned, we set k = 1024, s L = 19, 828, λ = 1 and T = 750 by default. [...] The hyperparameters of Lo RA are in Appendix D. Table 11: Lo RA Hyper-parameters: Batch Size 4, Epochs 2, Max Seq. Len. 4096, Lo RA Target {Q, K, V, O, Up, Down, Gate}_proj, Lo RA Rank 16, Lo RA α 32, Lo RA dropout 0.01, Learning Rate 1e-5, Optimizer Adam W Adma RMS ϵ 2e-4, Adam β (0.9, 0.999), Adam Weight Decay 0.01, Scheduler Constant LR.