LogME: Practical Assessment of Pre-trained Models for Transfer Learning
Authors: Kaichao You, Yong Liu, Jianmin Wang, Mingsheng Long
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with bruteforce fine-tuning, Log ME brings at most 3000 speedup in wall-clock time and requires only 1% memory footprint. It outperforms prior methods by a large margin in their setting and is applicable to new settings. It is general enough for diverse pre-trained models (supervised pre-trained and unsupervised pre-trained), downstream tasks (classification and regression), and modalities (vision and language). Code is available at this repository: https://github.com/thuml/Log ME. |
| Researcher Affiliation | Academia | 1School of Software, BNRist, Tsinghua University, Beijing 100084, China. |
| Pseudocode | Yes | Algorithm 1 Log ME |
| Open Source Code | Yes | Code is available at this repository: https://github.com/thuml/Log ME. |
| Open Datasets | Yes | For downstream classification tasks, we take 9 commonly used datasets: Aircraft (Maji et al., 2013), Birdsnap (Berg et al., 2014), Caltech (Fei-Fei et al., 2004), Cars (Krause et al., 2013), CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR100 (Krizhevsky & Hinton, 2009), DTD (Cimpoi et al., 2014), Pets (Parkhi et al., 2012), and SUN (Xiao et al., 2010). |
| Dataset Splits | Yes | Hence we grid search learning rates and weight decays (7 learning rates from 10 1 to 10 4, 7 weight decays from 10 6 to 10 3, all logarithmically spaced) to select the best hyper-parameter on the validation set and compute the accuracy on the test set. |
| Hardware Specification | No | The paper provides wall-clock time and memory footprint but does not specify the exact hardware components (e.g., specific GPU or CPU models) used for the experiments. |
| Software Dependencies | No | The paper mentions software like Py Torch, Tensor Flow, Hugging Face Transformers, and Sci Py, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Hence we grid search learning rates and weight decays (7 learning rates from 10 1 to 10 4, 7 weight decays from 10 6 to 10 3, all logarithmically spaced) to select the best hyper-parameter on the validation set and compute the accuracy on the test set. |