On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results.
Researcher Affiliation Collaboration School of Computer Science & Technology, Huazhong University of Science and Technology, Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), Ping An Property & Casualty Insurance Company of China, Ltd., The Chinese University of Hong Kong
Pseudocode Yes Our algorithm is outlined in pseudo-code in Algorithm 1 in Appendix F.
Open Source Code Yes Our paper uses publicly available datasets and provides the complete code and execution scripts in the supplementary material.
Open Datasets Yes We evaluate on the following datasets: mathmetical reasoning (GSM8K [8]); factual accuracy (Truthful QA [32]); realistic knowledge (Trivia QA [21]); multi-domain general knowledge (MMLU benchmark [13]); summarization (CNN-Daily Mail (CNN/DM) [47]).
Dataset Splits No The paper states 'All datasets are tested using a 0-shot setting' and mentions training models on respective training sets, but does not provide specific train/validation/test dataset splits or percentages required for reproduction.
Hardware Specification Yes All experiments are performed on H100 GPUs.
Software Dependencies No The paper mentions using 'VLLM' for inference but does not provide specific version numbers for it or any other key software dependencies like programming languages or deep learning frameworks.
Experiment Setup Yes For full fine-tuning, we set the batch size to 128, learning rate to 2e-5, optimizer to Adam. For Lo RA tuning, we set the rank to 64, learning rate to 1e-4, optimizer to Adam. We train for 3 epochs. During inference, we use greed decoding and set batch size to 256, top_p to 1.0 and temperature to 0.05.