Fusing Models with Complementary Expertise

Authors: Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing, Mikhail Yurochkin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our method through extensive experimental evaluations on image classification with standard deep learning methods, text classification, summarization, and question answering using Large Language Models (LLMs), and automatic evaluation of generated summaries.
Researcher Affiliation Collaboration Carnegie Mellon University University of Michigan Intel Labs MBZUAI Petuum, Inc. MIT-IBM Watson AI Lab
Pseudocode No The paper describes algorithms conceptually and mathematically but does not include explicit pseudocode blocks or sections labeled "Algorithm".
Open Source Code Yes Our implementation is publicly available at https: //github.com/hwang595/Fo E-ICLR2024.
Open Datasets Yes We use 40k images from the CIFAR-100 training set for partitioning and expert training and hold 10k images from the training set out as a validation set to train our fusing strategy by solving equation 3.2.
Dataset Splits Yes We use 40k images from the CIFAR-100 training set for partitioning and expert training and hold 10k images from the training set out as a validation set to train our fusing strategy by solving equation 3.2.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions software like ResNet-18, Pegasus models, Hugging Face models, and CatBoost, but does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup Yes For training the fusers in Fo E we use the Adam W optimizer with an initial learning rate at 0.001 and weight decay at 10 4. We use a batch size of {64, 128} across various tasks in our experiments. We train the fuser until convergence in all our experiments, which usually takes 10 50 epochs. We also use the cosine annealing learning rate scheduler for all fuser training.