TransTailor: Pruning the Pre-trained Model for Improved Transfer Learning

Authors: Bingyan Liu, Yifeng Cai, Yao Guo, Xiangqun Chen8627-8634

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple pre-trained models and datasets demonstrate that Trans Tailor outperforms the traditional pruning methods and achieves competitive or even better performance than other state-of-the-art transfer learning methods while using a smaller model.
Researcher Affiliation Academia Bingyan Liu1, Yifeng Cai2, Yao Guo1 , Xiangqun Chen1 1MOE Key Lab of HCST, Dept of Computer Science, School of EECS, Peking University 2School of Software and Microelectronics, Peking University {lby cs, caiyifeng, yaoguo, cherry}@pku.edu.cn
Pseudocode Yes Algorithm 1 The Pipeline of Trans Tailor
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for Trans Tailor.
Open Datasets Yes We evaluate Trans Tailor on the following five datasets that are widely used in transfer learning: Caltech 256-30 & Caltech 256-60 (Griffin, Holub, and Perona 2007). CUB-200 (Wah et al. 2011) and Stanford Dogs (Khosla et al. 2011). MIT Indoor-67 (Quattoni and Torralba 2009). All of them are pre-trained with the Image Net2012 (Deng et al. 2009) dataset.
Dataset Splits No The paper mentions '80 training images and 20 test images' for MIT Indoor-67 and '30 and 60 randomly sampled training examples' for Caltech256, and the overall iterative process uses accuracy on Dt (target data) to select the optimal sub-model, but it does not specify a general, explicit train/validation/test split for all datasets in a reproducible manner. For example, it doesn't clearly state the size or percentage of a dedicated validation set.
Hardware Specification No The paper does not specify the hardware used for conducting the experiments (e.g., GPU models, CPU types).
Software Dependencies No All experiments are conducted with Py Torch framework. Pre-trained models are provided by Torchvision. However, no specific version numbers for PyTorch or Torchvision are provided.
Experiment Setup Yes The learning rate is set to 0.005 for the FC layer and 0.0005 for Conv layers. τ is set to 0.3. After 10% FLOPs of the pre-trained model is pruned with our target-aware pruning, we conduct the importance-aware fine-tuning with 40 epochs for Res Net101 and 60 epochs for VGG-16.