Foundation Model is Efficient Multimodal Multitask Model Selector

Authors: Fanqing Meng, Wenqi Shao, zhanglin peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario.
Researcher Affiliation Collaboration Fanqing Meng Open GVLab, Shanghai AI Laboratory Shanghai Jiao Tong University mengfanqing33@gmail.com Wenqi Shao Open GVLab, Shanghai AI Laboratory shaowenqi@pjlab.org.cn Zhanglin Peng The University of Hong Kong Chonghe Jiang The Chinese University of Hong Kong Kaipeng Zhang Open GVLab, Shanghai AI Laboratory Yu Qiao Open GVLab, Shanghai AI Laboratory Ping Luo The University of Hong Kong Open GVLab, Shanghai AI Laboratory pluo@cs.hku.edu
Pseudocode Yes Algorithm 1 Alternating Minimization; Algorithm 2 Fast Alternating Minimization
Open Source Code Yes The code is available at https://github.com/Open GVLab/Multitask Model-Selector.
Open Datasets Yes For image classification, We adopt 11 classification benchmarks , including FGVC Aircraft [47], Caltech-101 [48], Stanford Cars [49], CIFAR-10 [50], CIFAR-100 [50], DTD [51], Oxford 102 Flowers [52], Food-101 [53], Oxford-IIIT Pets [54], SUN397 [55], and VOC2007 [56].
Dataset Splits Yes For referring expression comprehension, the standard metric Acc@0.5 on the validation set is used as the ground truth. ...However, since Flickr10k-H and Flickr10k-R do not provide a test set, we use a 6:1 ratio to divide the original training set of 7000 images into a training set and a test set.
Hardware Specification Yes We use a Tesla V100 with a batch size of 128 to perform finetuning. ...We use an Nvidia A100 with a batch size of 64 to perform finetuning. ...All experiments are implemented on an NVIDIA Tesla A100 GPU. ...For each task, we use 8 Nvidia A100 GPUs for label embedding, with a batch size of 512 for each GPU.
Software Dependencies No The paper mentions software components like CLIP, BERT, GPT-2, BART, and ELECTRA as foundation models and discusses optimizers like SGD and Adam W. However, it does not specify version numbers for these software components or any other libraries used (e.g., PyTorch, TensorFlow versions).
Experiment Setup Yes Therefore, we carefully fine-tune pre-trained models with a grid search of learning rate in {1e 1, 1e 2, 1e 3, 1e 4} and weight decay in {1e 3, 1e 4, 1e 5, 1e 6, 0}. And using SGD optimizer. ...We use a Tesla V100 with a batch size of 128 to perform finetuning. All input images are resized to 224 224.