Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Authors: Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method. In this section, we present empirical results demonstrating the effectiveness of BOMF in various NLP tasks.
Researcher Affiliation Academia Chaeyun Jang KAIST jcy9911@kaist.ac.kr Hyungi Lee KAIST lhk2708@kaist.ac.kr Jungtaek Kim University of Pittsburgh jungtaek.kim@pitt.edu Juho Lee KAIST juholee@kaist.ac.kr
Pseudocode Yes Refer to Algorithm 1 in Appendix B for the summary of BOMF.
Open Source Code Yes Code will be available at: https://github.com/chaeyoon-jang/bomf.git.
Open Datasets Yes For the RoBERTa model, we evaluated the performance for classification and utilized a subset of the GLUE benchmark [65]. For the T5-base model, we utilize the Stanford Question Answering Dataset (SQuAD 2.0) [52]. For the summarization task, we employed the Samsung Abstractive Messenger Summarization (SAMSum) dataset [19]. Korean Medical Multiple Choice Question Answering (Kor Med MCQA) dataset [32].
Dataset Splits Yes In this paper, we explore the process of fine-tuning PLMs using two types of datasets: a downstream training dataset Dtrn and a validation dataset Dval. The Recognizing Textual Entailment (RTE) task... comprising 2,490 training instances and 277 validation instances.
Hardware Specification Yes These experiments are rigorously conducted on high-performance computing hardware, specifically NVIDIA RTX 3090 and NVIDIA RTX A6000 GPUs, to ensure the efficiency and scalability of our models.
Software Dependencies Yes Our implementation leverages key libraries, including PyTorch 2.0.1 [49], Huggingface Transformers [69], and BoTorch [4], to construct a robust framework for our experiments.
Experiment Setup Yes Additionally, specific details on the fine-tuning methods can be found in Table 5. Details on our fine-tuning procedures are provided in Table 6. Details of our fine-tuning process are provided in Table 7. More specific details about the model and experiments can be found in Table 8.