LEEP: A New Measure to Evaluate Transferability of Learned Representations

Authors: Cuong Nguyen, Tal Hassner, Matthias Seeger, Cedric Archambeau

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate our LEEP measure in several scenarios. We show that the measure is useful for predicting the performance of two commonly used transfer learning algorithms head classifier re-training (Donahue et al., 2014; Razavian et al., 2014) and model fine-tuning (Agrawal et al., 2014; Girshick et al., 2014) not only for large target data sets, but also for small or imbalanced target data sets that are difficult to use for re-training.We evaluate the ability of LEEP to predict the performance of transfer and meta-transfer learning algorithms, prior to applying these algorithms in practice. We further show that LEEP is useful even in the small or imbalanced data settings, where training on the target task could be hard. We compare LEEP with the state of the art NCE transferability measure of Tran et al. (2019) and H score of Bao et al. (2019). Finally, we demonstrate the use of LEEP for source model selection. Our experiments are implemented in Gluon/MXNet (Chen et al., 2015; Guo et al., 2019).
Researcher Affiliation Industry 1Amazon Web Services 2Facebook AI (Work done before joining Facebook).
Pseudocode No The paper describes the steps of the LEEP measure verbally and mathematically but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link regarding the public availability of the source code for the described methodology.
Open Datasets Yes Image Net (Russakovsky et al., 2015) and Res Net20 (He et al., 2016), which is pre-trained on CIFAR10 (Krizhevsky, 2009). For each model, we construct 200 different target tasks from the CIFAR100 data set (Krizhevsky, 2009)We further add experiments where target data sets are constructed from the Fashion MNIST data set (Xiao et al., 2017).
Dataset Splits No The paper mentions training and test sets and specific numbers of examples for small data scenarios (e.g.,
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No Our experiments are implemented in Gluon/MXNet (Chen et al., 2015; Guo et al., 2019). (No version numbers provided for reproducibility.)
Experiment Setup Yes In all tests, we ran SGD for 100 epochs with learning rate 0.01 and batch size 10 since they were sufficient to obtain good transferred models.