Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
Authors: Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method. In this section, we present empirical results demonstrating the effectiveness of BOMF in various NLP tasks. |
| Researcher Affiliation | Academia | Chaeyun Jang KAIST jcy9911@kaist.ac.kr Hyungi Lee KAIST lhk2708@kaist.ac.kr Jungtaek Kim University of Pittsburgh jungtaek.kim@pitt.edu Juho Lee KAIST juholee@kaist.ac.kr |
| Pseudocode | Yes | Refer to Algorithm 1 in Appendix B for the summary of BOMF. |
| Open Source Code | Yes | Code will be available at: https://github.com/chaeyoon-jang/bomf.git. |
| Open Datasets | Yes | For the RoBERTa model, we evaluated the performance for classification and utilized a subset of the GLUE benchmark [65]. For the T5-base model, we utilize the Stanford Question Answering Dataset (SQuAD 2.0) [52]. For the summarization task, we employed the Samsung Abstractive Messenger Summarization (SAMSum) dataset [19]. Korean Medical Multiple Choice Question Answering (Kor Med MCQA) dataset [32]. |
| Dataset Splits | Yes | In this paper, we explore the process of fine-tuning PLMs using two types of datasets: a downstream training dataset Dtrn and a validation dataset Dval. The Recognizing Textual Entailment (RTE) task... comprising 2,490 training instances and 277 validation instances. |
| Hardware Specification | Yes | These experiments are rigorously conducted on high-performance computing hardware, specifically NVIDIA RTX 3090 and NVIDIA RTX A6000 GPUs, to ensure the efficiency and scalability of our models. |
| Software Dependencies | Yes | Our implementation leverages key libraries, including PyTorch 2.0.1 [49], Huggingface Transformers [69], and BoTorch [4], to construct a robust framework for our experiments. |
| Experiment Setup | Yes | Additionally, specific details on the fine-tuning methods can be found in Table 5. Details on our fine-tuning procedures are provided in Table 6. Details of our fine-tuning process are provided in Table 7. More specific details about the model and experiments can be found in Table 8. |