Deep Model Transferability from Attribution Maps

Authors: Jie Song, Yixin Chen, Xinchao Wang, Chengchao Shen, Mingli Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we propose an embarrassingly simple yet very efficacious approach to estimating the transferability of deep networks, especially those handling vision tasks. Unlike the seminal work of taskonomy that relies on a large number of annotations as supervision and is thus computationally cumbersome, the proposed approach requires no human annotations and imposes no constraints on the architectures of the networks. This is achieved, specifically, via projecting deep networks into a model space, wherein each network is treated as a point and the distances between two points are measured by deviations of their produced attribution maps. The proposed approach is several-magnitude times faster than taskonomy, and meanwhile preserves a task-wise topological structure highly similar to the one obtained by taskonomy. Code is available at https://github.com/zju-vipa/ Transferbility From Attribution Maps. Exploring the transferability between heterogeneous tasks sheds light on their intrinsic interconnections, and consequently enables knowledge transfer from one task to another so as to reduce the training effort of the latter. In this paper, we propose an embarrassingly simple yet very efficacious approach to estimating the transferability of deep networks, especially those handling vision tasks. Unlike the seminal work of taskonomy that relies on a large number of annotations as supervision and is thus computationally cumbersome, the proposed approach requires no human annotations and imposes no constraints on the architectures of the networks. This is achieved, specifically, via projecting deep networks into a model space, wherein each network is treated as a point and the distances between two points are measured by deviations of their produced attribution maps. The proposed approach is several-magnitude times faster than taskonomy, and meanwhile preserves a task-wise topological structure highly similar to the one obtained by taskonomy. Code is available at https://github.com/zju-vipa/ Transferbility From Attribution Maps. In our experiments, the proposed method takes about 20 GPU hours to compute the pairwise transferability relationships on one Quadro P5000 card for 20 pre-trained taskonomy models, while taskonomy takes thousands of GPU hours on the cloud2 for the same number of tasks. We adopt two evaluation metrics, P@K and R@K5, which are widely used in the information retrieval field, to compare the model transferability constructed from our method with that from tasknomy.
Researcher Affiliation Collaboration Jie Song1,3, Yixin Chen1, Xinchao Wang2, Chengchao Shen1, Mingli Song1,3 1Zhejiang University, 2Stevens Institute of Technology 3Alibaba-Zhejiang University Joint Institute of Frontier Technologies
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://github.com/zju-vipa/ Transferbility From Attribution Maps.
Open Datasets Yes We build three datasets, taskonomy data [37], indoor scene [20], and COCO [16], as the probe data to evaluate our method.
Dataset Splits No The paper does not specify train/validation/test splits for its own method, as it works with pre-trained models. It mentions 'validation data from taskonomy' in the context of SVCCA, but not for its own approach's training.
Hardware Specification Yes In our experiments, the proposed method takes about 20 GPU hours to compute the pairwise transferability relationships on one Quadro P5000 card for 20 pre-trained taskonomy models
Software Dependencies No The paper mentions 'Tensorflow' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup No The paper describes the general approach and the use of pre-trained models and probe data, but it does not specify hyperparameter values or detailed system-level training settings for its method, as it does not involve training new models.