LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Authors: Kaichao You, Yong Liu, Jianmin Wang, Mingsheng Long

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with bruteforce fine-tuning, Log ME brings at most 3000 speedup in wall-clock time and requires only 1% memory footprint. It outperforms prior methods by a large margin in their setting and is applicable to new settings. It is general enough for diverse pre-trained models (supervised pre-trained and unsupervised pre-trained), downstream tasks (classification and regression), and modalities (vision and language). Code is available at this repository: https://github.com/thuml/Log ME.
Researcher Affiliation Academia 1School of Software, BNRist, Tsinghua University, Beijing 100084, China.
Pseudocode Yes Algorithm 1 Log ME
Open Source Code Yes Code is available at this repository: https://github.com/thuml/Log ME.
Open Datasets Yes For downstream classification tasks, we take 9 commonly used datasets: Aircraft (Maji et al., 2013), Birdsnap (Berg et al., 2014), Caltech (Fei-Fei et al., 2004), Cars (Krause et al., 2013), CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR100 (Krizhevsky & Hinton, 2009), DTD (Cimpoi et al., 2014), Pets (Parkhi et al., 2012), and SUN (Xiao et al., 2010).
Dataset Splits Yes Hence we grid search learning rates and weight decays (7 learning rates from 10 1 to 10 4, 7 weight decays from 10 6 to 10 3, all logarithmically spaced) to select the best hyper-parameter on the validation set and compute the accuracy on the test set.
Hardware Specification No The paper provides wall-clock time and memory footprint but does not specify the exact hardware components (e.g., specific GPU or CPU models) used for the experiments.
Software Dependencies No The paper mentions software like Py Torch, Tensor Flow, Hugging Face Transformers, and Sci Py, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Hence we grid search learning rates and weight decays (7 learning rates from 10 1 to 10 4, 7 weight decays from 10 6 to 10 3, all logarithmically spaced) to select the best hyper-parameter on the validation set and compute the accuracy on the test set.