On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results. |
| Researcher Affiliation | Collaboration | School of Computer Science & Technology, Huazhong University of Science and Technology, Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), Ping An Property & Casualty Insurance Company of China, Ltd., The Chinese University of Hong Kong |
| Pseudocode | Yes | Our algorithm is outlined in pseudo-code in Algorithm 1 in Appendix F. |
| Open Source Code | Yes | Our paper uses publicly available datasets and provides the complete code and execution scripts in the supplementary material. |
| Open Datasets | Yes | We evaluate on the following datasets: mathmetical reasoning (GSM8K [8]); factual accuracy (Truthful QA [32]); realistic knowledge (Trivia QA [21]); multi-domain general knowledge (MMLU benchmark [13]); summarization (CNN-Daily Mail (CNN/DM) [47]). |
| Dataset Splits | No | The paper states 'All datasets are tested using a 0-shot setting' and mentions training models on respective training sets, but does not provide specific train/validation/test dataset splits or percentages required for reproduction. |
| Hardware Specification | Yes | All experiments are performed on H100 GPUs. |
| Software Dependencies | No | The paper mentions using 'VLLM' for inference but does not provide specific version numbers for it or any other key software dependencies like programming languages or deep learning frameworks. |
| Experiment Setup | Yes | For full fine-tuning, we set the batch size to 128, learning rate to 2e-5, optimizer to Adam. For Lo RA tuning, we set the rank to 64, learning rate to 1e-4, optimizer to Adam. We train for 3 epochs. During inference, we use greed decoding and set batch size to 256, top_p to 1.0 and temperature to 0.05. |