Sparse Deep Transfer Learning for Convolutional Neural Network

Authors: Jiaming Liu, Yali Wang, Yu Qiao

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To examine the effectiveness of our methods, we perform our sparse deep transfer learning approach on a number of benchmark transfer learning tasks. The results show that, compared to the standard fine-tuning approach, our proposed approach achieves a significant pruning rate on CNN while improves the accuracy of transfer learning.
Researcher Affiliation Academia Jiaming Liu, 1 Yali Wang, 1 Yu Qiao 1,2 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China 2The Chinese University of Hong Kong, Hong Kong jiaming.liu@email.ucr.edu, {yl.wang, yu.qiao}@siat.ac.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes For Alex Net, we evaluate our approach on two popular transfer learning tasks (Sharif Razavian et al. 2014; Azizpour et al. 2015), where the source domain of both tasks is object recognition with Image Net ILSVRC-2012 (1000 object classes, > 1 million images) (Deng et al. 2009). The target domains are respectively scene recognition with MIT Indoor67 (67 scene classes, 15,620 images) (Quattoni and Torralba 2009), and fine-grained flower recognition with Flower102 (102 flower classes, 7,169 images) (Nilsback and Zisserman 2008). For 16-layer VGGNet, we evaluate our approach on a transfer learning task for human action recognition in the videos (Simonyan and Zisserman 2014a; Wang et al. 2015), where the source data set is UCF101 (101 action classes, 13,320 videos) (Soomro, Zamir, and Shah 2012), the target data set is HMBD51 (51 action classes, 6,849 videos) (Kuehne et al. 2013).
Dataset Splits No The paper mentions dataset sizes and uses standard benchmarks, but does not explicitly provide the specific training/validation/test dataset splits (percentages, sample counts, or detailed splitting methodology) used for all experiments within the paper text.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Caffe model zoo' but does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the implicit-knowledge-extraction branch, we feed target images (MIT Indoor67 or Flower102) into Reference Source Net, and output the implicit knowledge by the softened softmax (τ is four). For the extra branch, we initialize it by copying FC7 & the output layer of implicit-knowledge-extraction branch. Finally, we train Hybrid-Transfer Net by the total loss in Eq. (3). The weight λ is one, so that the value of λLextra is set as about 0.1Lmain in Eq. (3). All the settings of our approach are the same as before, except that the basic structure is switched from Alex Net to VGGNet, the corresponding output layers are switched to the classes of UCF101 and HMDB51, the extra branch is added on the 14-th layer of VGGNet (an inner-product layer called FC6) for both streams, Sparse-Target Net is obtained by only pruning connections in the target domain, due to the limited data in both UCF101 and HMDB51, λ is set as two for the total training loss in Eq. (3), and the proportion of spatial/temporal stream is one/four for output fusion of two-stream net.