Fusing Models with Complementary Expertise
Authors: Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing, Mikhail Yurochkin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our method through extensive experimental evaluations on image classification with standard deep learning methods, text classification, summarization, and question answering using Large Language Models (LLMs), and automatic evaluation of generated summaries. |
| Researcher Affiliation | Collaboration | Carnegie Mellon University University of Michigan Intel Labs MBZUAI Petuum, Inc. MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper describes algorithms conceptually and mathematically but does not include explicit pseudocode blocks or sections labeled "Algorithm". |
| Open Source Code | Yes | Our implementation is publicly available at https: //github.com/hwang595/Fo E-ICLR2024. |
| Open Datasets | Yes | We use 40k images from the CIFAR-100 training set for partitioning and expert training and hold 10k images from the training set out as a validation set to train our fusing strategy by solving equation 3.2. |
| Dataset Splits | Yes | We use 40k images from the CIFAR-100 training set for partitioning and expert training and hold 10k images from the training set out as a validation set to train our fusing strategy by solving equation 3.2. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions software like ResNet-18, Pegasus models, Hugging Face models, and CatBoost, but does not provide specific version numbers for these or other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | For training the fusers in Fo E we use the Adam W optimizer with an initial learning rate at 0.001 and weight decay at 10 4. We use a batch size of {64, 128} across various tasks in our experiments. We train the fuser until convergence in all our experiments, which usually takes 10 50 epochs. We also use the cosine annealing learning rate scheduler for all fuser training. |