Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Authors: Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method. In this section, we present empirical results demonstrating the effectiveness of BOMF in various NLP tasks.
Researcher Affiliation Academia Chaeyun Jang KAIST EMAIL Hyungi Lee KAIST EMAIL Jungtaek Kim University of Pittsburgh EMAIL Juho Lee KAIST EMAIL
Pseudocode Yes Refer to Algorithm 1 in Appendix B for the summary of BOMF.
Open Source Code Yes Code will be available at: https://github.com/chaeyoon-jang/bomf.git.
Open Datasets Yes For the RoBERTa model, we evaluated the performance for classification and utilized a subset of the GLUE benchmark [65]. For the T5-base model, we utilize the Stanford Question Answering Dataset (SQuAD 2.0) [52]. For the summarization task, we employed the Samsung Abstractive Messenger Summarization (SAMSum) dataset [19]. Korean Medical Multiple Choice Question Answering (Kor Med MCQA) dataset [32].
Dataset Splits Yes In this paper, we explore the process of fine-tuning PLMs using two types of datasets: a downstream training dataset Dtrn and a validation dataset Dval. The Recognizing Textual Entailment (RTE) task... comprising 2,490 training instances and 277 validation instances.
Hardware Specification Yes These experiments are rigorously conducted on high-performance computing hardware, specifically NVIDIA RTX 3090 and NVIDIA RTX A6000 GPUs, to ensure the efficiency and scalability of our models.
Software Dependencies Yes Our implementation leverages key libraries, including PyTorch 2.0.1 [49], Huggingface Transformers [69], and BoTorch [4], to construct a robust framework for our experiments.
Experiment Setup Yes Additionally, specific details on the fine-tuning methods can be found in Table 5. Details on our fine-tuning procedures are provided in Table 6. Details of our fine-tuning process are provided in Table 7. More specific details about the model and experiments can be found in Table 8.