reproducibilityindex.ai

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results.
Researcher Affiliation	Collaboration	School of Computer Science & Technology, Huazhong University of Science and Technology, Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), Ping An Property & Casualty Insurance Company of China, Ltd., The Chinese University of Hong Kong
Pseudocode	Yes	Our algorithm is outlined in pseudo-code in Algorithm 1 in Appendix F.
Open Source Code	Yes	Our paper uses publicly available datasets and provides the complete code and execution scripts in the supplementary material.
Open Datasets	Yes	We evaluate on the following datasets: mathmetical reasoning (GSM8K [8]); factual accuracy (Truthful QA [32]); realistic knowledge (Trivia QA [21]); multi-domain general knowledge (MMLU benchmark [13]); summarization (CNN-Daily Mail (CNN/DM) [47]).
Dataset Splits	No	The paper states 'All datasets are tested using a 0-shot setting' and mentions training models on respective training sets, but does not provide specific train/validation/test dataset splits or percentages required for reproduction.
Hardware Specification	Yes	All experiments are performed on H100 GPUs.
Software Dependencies	No	The paper mentions using 'VLLM' for inference but does not provide specific version numbers for it or any other key software dependencies like programming languages or deep learning frameworks.
Experiment Setup	Yes	For full fine-tuning, we set the batch size to 128, learning rate to 2e-5, optimizer to Adam. For Lo RA tuning, we set the rank to 64, learning rate to 1e-4, optimizer to Adam. We train for 3 epochs. During inference, we use greed decoding and set batch size to 256, top_p to 1.0 and temperature to 0.05.