Model Spider: Learning to Rank Pre-Trained Models Efficiently
Authors: Yi-Kai Zhang, Ting-Ji Huang, Yao-Xiang Ding, De-Chuan Zhan, Han-Jia Ye
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate MODEL SPIDER on three benchmarks: the PTM zoo comprising heterogeneous models from the single-source, multi-source datasets, or composed of large language models. We analyze the influence of key components in MODEL SPIDER and visualize the ability of a PTM using spider charts based on the learned representation. |
| Researcher Affiliation | Academia | 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2State Key Lab of CAD & CG, Zhejiang University |
| Pseudocode | Yes | Algorithm 1 The Training Part of the MODEL SPIDER |
| Open Source Code | Yes | Code is available at https://github.com/zhangyikaii/Model-Spider. |
| Open Datasets | Yes | We evaluate various methods on 9 downstream datasets, i.e. Aircraft [59], Caltech101 [32], Cars [47], CIFAR10 [49], CIFAR100 [49], DTD [19], Pet [73], and SUN397 [107] for classification, UTKFace [118] and d Sprites [61] for regression. |
| Dataset Splits | Yes | Specifically, we grid search the learning rates (7 learning rates from 10 1 to 10 4, logarithmically spaced) and weight decays (7 weight decays from 10 6 to 10 3, logarithmically spaced) to select the best hyper-parameter on the validation set and compute the accuracy on the downstream test set. |
| Hardware Specification | Yes | We build the model zoo with around 5K GPU hours (on NVIDIA V100 GPUs). |
| Software Dependencies | No | The paper mentions software like PyTorch and EsViT, but crucially, it does not provide specific version numbers for these or other libraries/solvers. |
| Experiment Setup | Yes | We meticulously conduct a grid-search of hyper-parameters, such as optimizers, learning rates, and weight decays (2 optimizers as SGD or Adam, 6 learning rates from 5 10 2 to 10 4, and 3 weight decay values from 5 10 4 to 10 5, batch size of 128, and the maximum epoch of 100). |