ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Authors: Qishi Dong, Awais Muhammad, Fengwei Zhou, Chuanlong Xie, Tianyang Hu, Yongxin Yang, Sung-Ho Bae, Zhenguo Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our paradigm on a diverse model zoo consisting of 35 models for various Oo D tasks and demonstrate: (i) model ranking is better correlated with fine-tuning ranking than previous methods and up to 9859x faster than brute-force fine-tuning; (ii) Oo D generalization after model ensemble with feature selection outperforms the state-of-the-art methods and the accuracy on most challenging task Domain Net is improved from 46.5% to 50.6%. In this section, we demonstrate the effectiveness of Zoo D. First, we evaluate the ability of our ranking metric to estimate Oo D performance and compare it with the ground-truth performance and several existing IID ranking methods. Second, we show that our aggregation method achieves significant improvements and SOTA results on several Oo D datasets.
Researcher Affiliation Collaboration Qishi Dong 2,1 , Awais Muhammad 3,1 , Fengwei Zhou 1 , Chuanlong Xie 4,1 , Tianyang Hu 1, Yongxin Yang 1, Sung-Ho Bae 3, Zhenguo Li 1 1 Huawei Noah s Ark Lab, 2 Hong Kong Baptist University, 3 Kyung-Hee University, 4 Beijing Normal University
Pseudocode Yes Algorithm 1 Pseudocode of Variational EM Algorithm for Bayesian Feature Selection
Open Source Code No Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/zood. Will be released upon publication.
Open Datasets Yes We conduct experiments on six Oo D datasets: PACS [43], VLCS [24], Office-Home [77], Terra Incognita [10], Domain Net [63], and NICO (NICO-Animals & NICO-Vehicles) [31].
Dataset Splits Yes The standard way to conduct the experiment is to choose one domain as test (unseen) domain and use the remaining domains as training domains, which is named leave-one-domain-out protocol. We adopt the leave-one-domain-out cross-validation setup in Domain Bed with 10 experiments for hyper-parameter selection and run 3 trials.
Hardware Specification No The paper mentions 'GPU hours' and 'GPU years' in Section 4.3 and Table 1c, and states it includes 'the type of resources used' in the checklist, but it does not specify concrete hardware details such as specific GPU models (e.g., V100, A100), CPU models, or cloud providers.
Software Dependencies No The paper mentions software like 'Mind Spore' and references 'Pytorch' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We adopt the leave-one-domain-out cross-validation setup in Domain Bed with 10 experiments for hyper-parameter selection and run 3 trials. We triple the number of iterations for Domain Net (5000 to 15000) as it is a large-scale dataset requiring more iterations [17] and decrease the number of experiments for hyper-parameter selection from 10 to 5.