Frustratingly Easy Transferability Estimation
Authors: Long-Kai Huang, Junzhou Huang, Yu Rong, Qiang Yang, Ying Wei
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite its extraordinary simplicity in 10 lines of codes, Trans Rate performs remarkably well in extensive evaluations on 32 pre-trained models and 16 downstream tasks. In this section, we evaluate the correlation between predicted transferability by Trans Rate and the transfer learning performance in various settings and for different tasks. |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab 2Hong Kong University of Science and Technology 3City University of Hong Kong. |
| Pseudocode | Yes | A.7. Source Codes of Trans Rate. We implement the Trans Rate by Python. The codes are as follows: import numpy as np def coding_rate(Z, eps=1E-4): n, d = Z.shape (sign, rate) = np.linalg.slogdet((np.eye(d) + 1 / (n * eps) * Z.transpose() @ Z)) return 0.5 * rate def trans_rate(Z, y, eps=1E-4): Z = Z - np.mean(Z, axis=0, keepdims=True) RZ = coding_rate(Z, eps) RZY = 0. K = int(y.max() + 1) for i in range(K): RZY += coding_rate(Z[(y == i).flatten()], eps) return RZ - RZY / K |
| Open Source Code | Yes | A.7. Source Codes of Trans Rate. We implement the Trans Rate by Python. The codes are as follows: import numpy as np def coding_rate(Z, eps=1E-4): n, d = Z.shape (sign, rate) = np.linalg.slogdet((np.eye(d) + 1 / (n * eps) * Z.transpose() @ Z)) return 0.5 * rate def trans_rate(Z, y, eps=1E-4): Z = Z - np.mean(Z, axis=0, keepdims=True) RZ = coding_rate(Z, eps) RZY = 0. K = int(y.max() + 1) for i in range(K): RZY += coding_rate(Z[(y == i).flatten()], eps) return RZ - RZY / K |
| Open Datasets | Yes | The source datasets are Image Net (Russakovsky et al., 2015) and 10 image datasets from (Salman et al., 2020), including Caltech-101, Caltech-256, DTD, Flowers, SUN397, Pets, Food, Aircraft, Birds and Cars. For each source dataset, we pre-train a Res Net-18 (He et al., 2016), freeze it and discard the source data during fine-tuning. CIFAR-100 (Krizhevsky et al., 2009) and FMNIST (Xiao et al., 2017) are adopted as the target tasks. |
| Dataset Splits | No | The batch size (16, 32, 64, 128), learning rate (from 0.0001 to 0.1) and weight decay (from 1E-6 to 1E-4) are determined via grid search of the best average transfer performance over 10 runs on a validation set. For all target datasets, we use the whole training set for fine-tuning and for transferability estimation. The specific split of this 'validation set' from the training data is not provided with percentages or exact counts. |
| Hardware Specification | Yes | We run the experiments on a server with 12 Intel Xeon Platinum 8255C 2.50GHz CPU and a single P40 GPU. |
| Software Dependencies | No | We implement the Trans Rate by Python. This only specifies the language, not specific versions of libraries or frameworks. |
| Experiment Setup | Yes | The batch size (16, 32, 64, 128), learning rate (from 0.0001 to 0.1) and weight decay (from 1E-6 to 1E-4) are determined via grid search of the best average transfer performance over 10 runs on a validation set. For fine-tuning of the target task, the feature extractor is initialized by the pre-trained model. Then the feature extractor together with a randomly initialized head is optimized by running SGD on a cross-entropy loss for 100 epoches. |