reproducibilityindex.ai

Frustratingly Easy Transferability Estimation

Authors: Long-Kai Huang, Junzhou Huang, Yu Rong, Qiang Yang, Ying Wei

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite its extraordinary simplicity in 10 lines of codes, Trans Rate performs remarkably well in extensive evaluations on 32 pre-trained models and 16 downstream tasks. In this section, we evaluate the correlation between predicted transferability by Trans Rate and the transfer learning performance in various settings and for different tasks.
Researcher Affiliation	Collaboration	1Tencent AI Lab 2Hong Kong University of Science and Technology 3City University of Hong Kong.
Pseudocode	Yes	A.7. Source Codes of Trans Rate. We implement the Trans Rate by Python. The codes are as follows: import numpy as np def coding_rate(Z, eps=1E-4): n, d = Z.shape (sign, rate) = np.linalg.slogdet((np.eye(d) + 1 / (n * eps) * Z.transpose() @ Z)) return 0.5 * rate def trans_rate(Z, y, eps=1E-4): Z = Z - np.mean(Z, axis=0, keepdims=True) RZ = coding_rate(Z, eps) RZY = 0. K = int(y.max() + 1) for i in range(K): RZY += coding_rate(Z[(y == i).flatten()], eps) return RZ - RZY / K
Open Source Code	Yes	A.7. Source Codes of Trans Rate. We implement the Trans Rate by Python. The codes are as follows: import numpy as np def coding_rate(Z, eps=1E-4): n, d = Z.shape (sign, rate) = np.linalg.slogdet((np.eye(d) + 1 / (n * eps) * Z.transpose() @ Z)) return 0.5 * rate def trans_rate(Z, y, eps=1E-4): Z = Z - np.mean(Z, axis=0, keepdims=True) RZ = coding_rate(Z, eps) RZY = 0. K = int(y.max() + 1) for i in range(K): RZY += coding_rate(Z[(y == i).flatten()], eps) return RZ - RZY / K
Open Datasets	Yes	The source datasets are Image Net (Russakovsky et al., 2015) and 10 image datasets from (Salman et al., 2020), including Caltech-101, Caltech-256, DTD, Flowers, SUN397, Pets, Food, Aircraft, Birds and Cars. For each source dataset, we pre-train a Res Net-18 (He et al., 2016), freeze it and discard the source data during ﬁne-tuning. CIFAR-100 (Krizhevsky et al., 2009) and FMNIST (Xiao et al., 2017) are adopted as the target tasks.
Dataset Splits	No	The batch size (16, 32, 64, 128), learning rate (from 0.0001 to 0.1) and weight decay (from 1E-6 to 1E-4) are determined via grid search of the best average transfer performance over 10 runs on a validation set. For all target datasets, we use the whole training set for ﬁne-tuning and for transferability estimation. The specific split of this 'validation set' from the training data is not provided with percentages or exact counts.
Hardware Specification	Yes	We run the experiments on a server with 12 Intel Xeon Platinum 8255C 2.50GHz CPU and a single P40 GPU.
Software Dependencies	No	We implement the Trans Rate by Python. This only specifies the language, not specific versions of libraries or frameworks.
Experiment Setup	Yes	The batch size (16, 32, 64, 128), learning rate (from 0.0001 to 0.1) and weight decay (from 1E-6 to 1E-4) are determined via grid search of the best average transfer performance over 10 runs on a validation set. For ﬁne-tuning of the target task, the feature extractor is initialized by the pre-trained model. Then the feature extractor together with a randomly initialized head is optimized by running SGD on a cross-entropy loss for 100 epoches.