Foundation Model is Efficient Multimodal Multitask Model Selector
Authors: Fanqing Meng, Wenqi Shao, zhanglin peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario. |
| Researcher Affiliation | Collaboration | Fanqing Meng Open GVLab, Shanghai AI Laboratory Shanghai Jiao Tong University mengfanqing33@gmail.com Wenqi Shao Open GVLab, Shanghai AI Laboratory shaowenqi@pjlab.org.cn Zhanglin Peng The University of Hong Kong Chonghe Jiang The Chinese University of Hong Kong Kaipeng Zhang Open GVLab, Shanghai AI Laboratory Yu Qiao Open GVLab, Shanghai AI Laboratory Ping Luo The University of Hong Kong Open GVLab, Shanghai AI Laboratory pluo@cs.hku.edu |
| Pseudocode | Yes | Algorithm 1 Alternating Minimization; Algorithm 2 Fast Alternating Minimization |
| Open Source Code | Yes | The code is available at https://github.com/Open GVLab/Multitask Model-Selector. |
| Open Datasets | Yes | For image classification, We adopt 11 classification benchmarks , including FGVC Aircraft [47], Caltech-101 [48], Stanford Cars [49], CIFAR-10 [50], CIFAR-100 [50], DTD [51], Oxford 102 Flowers [52], Food-101 [53], Oxford-IIIT Pets [54], SUN397 [55], and VOC2007 [56]. |
| Dataset Splits | Yes | For referring expression comprehension, the standard metric Acc@0.5 on the validation set is used as the ground truth. ...However, since Flickr10k-H and Flickr10k-R do not provide a test set, we use a 6:1 ratio to divide the original training set of 7000 images into a training set and a test set. |
| Hardware Specification | Yes | We use a Tesla V100 with a batch size of 128 to perform finetuning. ...We use an Nvidia A100 with a batch size of 64 to perform finetuning. ...All experiments are implemented on an NVIDIA Tesla A100 GPU. ...For each task, we use 8 Nvidia A100 GPUs for label embedding, with a batch size of 512 for each GPU. |
| Software Dependencies | No | The paper mentions software components like CLIP, BERT, GPT-2, BART, and ELECTRA as foundation models and discusses optimizers like SGD and Adam W. However, it does not specify version numbers for these software components or any other libraries used (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Therefore, we carefully fine-tune pre-trained models with a grid search of learning rate in {1e 1, 1e 2, 1e 3, 1e 4} and weight decay in {1e 3, 1e 4, 1e 5, 1e 6, 0}. And using SGD optimizer. ...We use a Tesla V100 with a batch size of 128 to perform finetuning. All input images are resized to 224 224. |